MOSTOW- SAMPSON : MEYER 
Fundamental Stuctures of Algebra 


Fundamental Structures of ALGE@RA 


George D. Mostow 


YALE UNIVERSITY 


Joseph H. Sampson 


THE JOHNS HOPKINS UNIVERSITY 


Jean-Pierre Meyer 


THE JOHNS HOPKINS UNIVERSITY 


Fundamental Structures of ALGEBRA 


Bjmccmaws BOOK COMPANY New York Toronto London 


FUNDAMENTAL STRUCTURES OF ALGEGRA 


Copyright © 1963 by McGraw-Hill Ine. All Rights Reserved. 
Printed in the United States of America. This book, or parts thereof, 
may not be reproduced in any form without permission of the publishers. 


Library of Congress Catalog Card Number 638-18148 assoe 


4567890 UL 9876 


Preface 


The great evolution of the physical, engineering, and social sciences during 
the past half century has cast mathematics in a rdle quite different from 
its familiar one of a powerful but essentially passive instrument for com- 
puting answers. In fact that view of mathematics was never a correct one, 
but it has had a strong influence in determining the standard undergradu- 
ate mathematics curriculum. Its inadequacy is becoming increasingly 
apparent with the growing recognition that mathematics is at the very 
heart of many modern scientific theories—not merely as a calculating de- 
vice, but much more fundamentally as the sole language in which the 
theories can be expressed. Thus mathematics plays an organic and crea- 
tive part in science, as a limitless source of concepts which provide fruitful 
new ways of representing natural phenomena. 

The view of mathematics as a calculating device, and the traditional 
pre-eminence of analytic geometry and calculus in the undergraduate 
mathematica curriculum, can be traced in part to the dominant influence 
of classical physics, especially mechanics, and to the almost ineradicable 
prejudice in favor of expressing the laws of Nature in the form of simple 
mechanical analogies. This “billiard-ball” conception of Nature still per- 
sista, but its limitations have been known for a long time; modern science 
cannot confine itself to that naive conception, and that part of mathe- 
maties with which it is Jinked—though utterly indispensable—is none- 
theless inadequate for science. An ability to deal fluently with abstract 
systems has become a necessity. 

Yet algebra, the mathematics of abstract systems par excellence, has 
been commonly neglected in undergraduate curricula; indeed it is often 
omitted almost entirely from the mathematical education of science stu- 
dents, greatly to the detriment of their understanding of mathematics. 
The aim of this book is to acquaint students in the physical, engineering, 
and social sciences with the most important algebraic structures and with 
the mathematician’s way of discussing them. The book contains material 
which is not usually given before the junior or senior year of college, and 
much of the subject matter covered here is not generally presented to sci- 
ence students at all. It may therefore seem surprising that the book was 


vi 


Preface 


designed to enable the student to begin his study of algebra at a very 
early stage of his undergraduate career. A preliminary mimeographed 
yersion was used with gratifying success in a course given by the authors 
at Johns Hopkins University in 1960-1961. Among those who took the 
first half of the course were freshmen who contemplated more than one 
term of mathematics in college. (Entry of freshmen into the second half 
of the course was restricted to those in the top third of their class who 
studied linear algebraf simultaneously with a course in calculus.) Al- 
though the material was of a level usually considered rather advanced, 
it was our experience that it was understood and learned as readily by the 
freshmen as by those students who had taken the standard analytic 
geometry and calculus courses. The explanation for this appears to be 
that the lesser mathematical experience of the freshmen was more than 
compensated by their freedom from stubborn misconceptions and by their 
stimulation upon encountering something that was not just a prolongation 
of high school. Before the first term was over, we were able to communicate 
with them in the precise and lucid Janguage of mathematics. 

And it is in the language of mathematics that many pedagogical diffi- 
culties lie. Few persons have learned the precise use of even their native 
tongue properly; and the habits of precision are not easily acquired. That 
is a fortiori go in learning the unfamiliar language of mathematics. It is 
not until the student is really well acquainted with the mathematical 
idiom that simple mathematical statements become simple for him. In 
order to communicate the idiom we have first applied it to rather familiar 
situations, and, as it is established, new subject matter is presented at an 
increasing rate. 

The second half of this book is devoted chiefly to a rather detailed ac- 
count of linear algebra and some of its applications. A student who com- 
pletes it will be well prepared for the use of linear algebra in science and 
engineering courses, as well as in advanced mathematics courses. Chap- 
ter 12 presents an application of linear algebra to the solution of differen- 
tial equations. It should be mentioned that Chapters 4 and 65 in this book 
were included mainly for reference. By omitting most of those two chap- 
ters it is possible to put the rest of the material through Chapter 8 into one 
term. Chapter 15 contains a precise discussion of mappings, relations, 
and equivalence classes, as well as of several important constructions in- 
volving equivalence classes. The reader may find it profitable to refer to 
Chapter 15 several times during the course. Chapter 16 ia concerned with 
tensor algebra, a special but important part of linear algebra. Although a 
typical one-year course may stop short of Chapter 16, the material in it 
should not present great difficulty to students who have absorbed the main 
ideas in the earlier part of the book. 


{ Cf. Instructor’s Manual. 


Preface 


vi 


The authors have been guided by the pedagogic philosophy that general 
abstract concepts are not really assimilated until employed in particular 
situations. As a result, this book makes excursions into special topics that 
are not usually found in a beginner's book on the foundations of algebra. 
Accordingly, the chapter on integers devotes considerable space to elemen- 
tary number theory; the chapter on polynomials contains the solution of 
cubic and biquadratic equations; the chapter on group theory contains the 
Sylow theorems. The special topics have been selected with two objec- 
tives in mind: (1) They impart important information. (2) They pro- 
vide pedagogically valuable opportunities for the manipulation of general 
concepts. 

In conclusion we should like to comment on the place of this book in the 
undergraduate program, As indicated above, our experience has been 
that a one-year algebra course, early in the curriculum, accelerates the 
student’s mathematical development and enhances his ability to under- 
stand and learn mathematics. Moreover, it helps to provide a better- 
balanced preparation for scientific applications and for advanced mathe- 
matics courses, And it should be noted that linear algebra is usually re- 
quired in science and engineering courses earlier than it is given in the 
standard mathematics curriculum. Beginning students can understand 
algebra at this level, and therefore it is both practical and desirable to re- 
place the usual six-semester sequence of analytic geometry} and calculus 
by a two-semester algebra and four-semester calculus sequence. This plan 
has the very important advantage that the superior student can take 
algebra and calculus simultaneously in his first year, being thereby enabled 
to complete the basic mathematics requirements in two years instead of 
three, 

We take this opportunity to express our gratitude to Mr. James 
Sauvé, 8.J., for his many important suggestions, for his invaluable contri- 
bution in the preparation of the manuscript, and for helping us to avoid 


errors and ambiguities. 
George D. Mostow 


Joseph H. Sampson 
Jean-Pierre Meyer 


+ Chapter 8 contains a considerable discussion of analytic geometry. 


Contents 


Profeca 
List of Special Symbols xv 


1. Binary Operations and Groups 
1. INTRODUCTION 1 
2. SETS AND MAPPINGS : | 
3, BINARY OPERATIONS 5 
4. THE ASSOCIATIVE AXIOM 8 
S. THE COMMUTATIVE AXIOM a 
6 GROUPS 12 


SOMORPHISMS AND HOMOMORPHISMS 16 


&, RESTATEMENT OF THE GROUP AXIOMS — 20 


9%. SYSTEMS WITH TWO BINARY OPERATIONS: RINGS, INTEGRAL DOMAINS, 
FIELDS 21. 


2. Rings, Integral Domains, the Integers 
LINTRODUCTION 28 
2, SYSTEMS WITH TWO BINARY OPERATIONS: RINGS AND INTEGRAL DOMAINS 
3. ORDERED INTEGRAL DOMAINS a2 
4. THE SYSTEM OF INTEGERS 36 
5, SOME COMMENTS 39 
6. FINITE AND COUNTABLE SETS 42 
7, MATHEMATICAL INDUCTION AND SOME OF ITS APPLICATIONS 43 
8. SOME ELEMENTARY NUMBER THEORY ‘Sa 
9. NOTATION FOR INTEGERS 62 
10. MORE ELEMENTARY NUMBER THEORY: CONGRUENCES co 


11, PROOF OF THEOREM 4.40 77 


28 


2. 


4 


5 


14 


2 


4 


4 


3. A GEOMETRIC INTERPRETATION OF ADDITION AND MULTIPLICATION OF 


a 


L 


2 


ca 


4 


6. 


1 


2 


3% 


Contents 
3. Fields, the Rational Numbers 


{INTRODUCTION 51 
FIELDS = 81 

THE FIELD OF RATIONAL NUMBERS 85 
DECIMALS 89 


THE BINOMIAL THEOREM 95 


4. The Real-number System 
INTRODUCTION 100 
CAUCHY SEQUENCES AND LIMITS: 102 
THE FIELD OF REAL NUMBERS 108 


SOME PROPERTIES OF R111 


5. The Field of Complex Numbers 


THE SQUARE ROOT OF -1 118 


A CONSTRUCTION OF C; QUATERNIONS — 122 


COMPLEX NUMBERS 125 


CAUCHY SEQUENCES AND INFINITE SERUES IN C 


6. Polynomials 
INTRODUCTION 133 
INDETERMINATES, OR VARIABLES 133 
FACTORIZATION OF POLYNOMIALS 140 
ROOTS OF POLYNOMIALS us 
POLYNOMIALS IN SEVERAL VARIABLES 153 


POLYNOMIALS OF DEGREE LESS THANS 158 


7. Rational Functions 
INTRODUCTION 163 
RATIONAL FUNCTIONS 163 


PARTIAL FRACTIONS 465 


81 


100 


118 


183 


163 


Contents ai 
8. Vector Spaces and Affine Spaces 171 


LINTRODUCTION 171 

2. THE BASIC DEFINITIONS = 171 

3. SOME CONSEQUENCES OF THE AXIOMS = 172 
4. SOME IMPORTANT EXAMPLES 174 

S$, SUBSPACES 176 

6, LINEAR INDEPENDENCE AND DIMENSION = 178 


7. A THEOREM ON LINEAR EQUATIONS 180 


ON THE DIMENSION OF VECTOR SPACES 182 


. BASE VECTORS 183 


|. AFFINE SPACES 187 


1. 


EUCLIDEAN SPACES 198 


12. ANALYTIC GEOMETRY — 205 


9. Linear Transformations and Matrices 216 


F 


INTRODUCTION 216 
2. A NOTATIONAL CONVENTION 216 


3. LINEAR MAPPINGS = 217 


OPERATIONS ON LINEAR MAPPINGS = 221 


e 


LINEAR TRANSFORMATIONS AND MATRICES 225 


6, OPERATIONS OM MATRICES © 231 


™ 


CHANGE OF BASE = 238 
8, RANK OF A MATRIX; LINEAR EQUATIONS; SUBSPACES 242 
9, REDUCTION TO DIAGONAL FORM = 246 

10, QUOTIENT SPACES = 258 


21, MODULES = 261 


10. Groups and Permutations 266 


r 


INTRODUCTION 266 
2. BASIC PROPERTIES © 266 
3. PERMUTATIONS 268 


|. SURGROUPS AND QUOTIENT GROUPS = 275 


ait Contents 


5. TRANSFORMATION GROUPS; SYLOW’S THEOREMS 282 
6 THE JORDAN-HOLDER THEOREM = 291 


1. FINITE ABELIAN GROUPS = 298 


11. Determinants 304 
LL INTRODUCTION 304 
2. AXIOMS FOR DETERMINANTS 305 


3. SOME Al 


CATIONS = 311 
4, THE CHARACTERISTIC POLYNOMIAL 317 
5. EIGENVALUES AND EIGENVECTORS 324 


6. DETERMINANTS AS VOLUMES 333 


12. Rings of Operators and Differential Equations 841 


L.ANTRODUCTION = 341 

2, RINGS AND HOMOMORPHISMS 341 

3. HOMOMORPHISMS OF RINGS 344 

4. THE DIFFERENTIATION OPERATOR © 347 

5. SOME DIFFERENTIATION FORMULAS 352 

6. LINEAR DIFFERENTIAL EQUATIONS WITH CONSTANT COEFFICIENTS | 354 
7. FINDING PARTICULAR AND GENERAL SOLUTION 361 

8, TRIGONOMETRIC FUNCTIONS 367 

9. SYSTEMS OF EQUATIONS 370 


10. ONE-PARAMETER GROUPS ANO INFINITESIMAL GENERATORS = 377 


18, The Jordan Normal Form 379 


A INTRODUCTION = 379 

2, ELEMENTARY LINEAR MAPPINGS 380 
3. DIRECT SUM DECOMPOSITIONS — 382 
4, NILPOTENT MAPPINGS 389 

5. CMARACTERISTIC SURSPACES 395 

6 THE JORDAN NORMAL FORM = 389 


7, UNIQUENESS OF THE JORDAN NORMAL FORM = 406 


Contents wtih 
6. THE PROBLEM OF SIMILARITY 408 
9. ELEMENTARY DIVISORS  ¢11 
10, ELEMENTARY DIVISORS AND SIMILARITY — 420 
14, MODULES, TORSION ORDERS, AND THE RATIONAL CANONICAL FORM 423 


12, FINITELY GENERATED ABELIAN GROUPS 431 


14. Quadratic and Hermitian Forms 433 
1 INTRODUCTION 433 
2, LINEAR FUNCTIONS; DUAL SPACES 433 
3. BILINEAR FUNCTIONS 438 
4. QUADRATIC FORMS 44 
§. REDUCTION TO DIAGONAL FORM ate 
6. HERMITIAN FORMS; UNITARY MAPPINGS 452 
7. EUCLIDEAN VECTOR SPACES 459 
%. ORTHONORMAL BASES 462 
9. FOURIER SERIES, BESSEL’S INEQUALITY 467 
10, THE EIGENVALUES OF A HERMITIAN MATRIX 478 
11, SIMULTANEOUS DIAGONALIZATION OF TWO HERMITIAN FORMS am 
12, UNITARY MATRICES ama 
15, VECTOR PRODUCTS IN ORIENTED 3-SPACE 479 


14. ANALYTIC GEOMETRY IN 2 DIMENSIONS 486 


15. Quotient Structures 494 
1, MAPPINGS 494 
2. RELATIONS 496 


3. QUOTIENT SET 497 


INARY OPERATIONS ON QUOTIENT SETS 499 
$. THE CONSTRUCTION OF THE FIELD OF QUOTIENTS OF AN INTEGRAL DOMAIN 503 


6, THE CONSTRUCTION OF THE FIELD OF REAL NUMBERS FROM THE FIELD OF 
RATIONAL NUMBERS 504 


THE CONSTRUCTION OF A FIELD CONTAINING A ROOT OF A POLYNOMIAL S08 


8 A PARADOX TO AVOID = 510 


9, BERNSTEIN'S THEOREM ON CARDINAL NUMBERS 510 


xiv Contents 
16. Tensors 


1. INTRODUCTION 514 

2. TENSOR PRODUCTS 516 

3. TENSOR PRODUCTS OF MORE THAN TWO FACTORS 

4, TENSOR PRODUCTS OF MAPPINGS 524 

5. THE TENSOR ALGEBRA OF A VECTOR SPACE 527 
BASES AND COMPONENTS 534 

7. CONTRACTION OF TENSORS 538 

8. SYMMETRY PROPERTIES 543 

9. THE METRIC 547 

19. THE EXTERIOR ALGEBRA = 551. 

31. PLUCKER COORDINATES; DUALITY 561 


412, SKEW-SYMMETRIC TENSORS 572 


Index 875 


sao 


514 


List of special symbols 


aro ne 


w, 

Tra 

det a 
mats.a, T 
Kr, Ku Kn 


Ker T, Im T 
Hom (U, V) 
£:89T 
feg 
<f,x> 
Ux¥ 
AXB 
UAY¥ 


the system of positive integers, p. 36 

the ring of integers, p. 36 

the ring of residue classes modulo m, p. 72 
the field of rational numbers, p. 86 

the field of real numbers, p. 110 

the field of complex numbers, p. 119 


matrix element in jth row and ith column, p. 229 . 

jth row of a matrix, p. 229 

ith column of a matrix, p. 229 

the transpose of a matrix, p. 235 

the adjoint of a matrix, p. 314 

Kronecker delta, p. 230 

trace of a matrix, p, 311 

determinant of a matrix, p. 308 

the matrix of a linear mapping with respect to a base pair B, B’ 
respectively the sets of m x 1, 1 X », and m X ” matrices with 
coefficients in K, p. 229 


p. 218 
p. 222 

mapping symbol, p. 2 

composition of mappings, p. 3 

inner product, p. 223 

vector product, p. 482 

cartesian product, pp. 495, 514 

exterior product, p. 552 

tensor product symbol, p. 438 and Chap. 16 
direct sum, pp. 178, 382, 385 

dual operator, p. 565 

number of elements in the set E, p. 277 
intersection, p. 279 

isomorphiam, pp. 17, 293 


tot List of special symbols 


(a) 

End V 
Ti~Tz 
GLU) 
Bie) 
K(x) 
(1, 6] 
{a, b) 


cyclic subgroup generated by a, p. 267 
algebra of endomorphisms of V, p. 342 
similarity, p. 408 

general linear group, p. 532 
adjunction of ¢ to a ring B, p. 133 

field of rational functions, p. 163 
segment, p. 38. 

greateat common divisor, pp. 57, 142 


Fundamental Structures of ALGEBRA 


Binary operations and groups 


1. Introduction 


The axioms that we shall assume in this chapter are inspired by experience with 
ordinary numerical calculations—for example, addition and multiplication. Each 
of these operations associates with two given numbers another number, their sum 
in the case of addition and their product in the case of multiplication. Here we 
shall investigate the consequences of assuming that we have some kind of operation 
that associates with two objects (not necessarily numbers) another object. The 
operation will be assumed to obey certain rules, or axioms. In this chapter we 
shall set down two main axioms, and they will be the basis for the definition of a 
mathematical system called a group. The same axioms will occur repeatedly in 
later chapters. These axioms require some explanation, and they are discussed at 
length in the following sections. 


2. Sets and mappings 


We begin by mentioning some terminology which is constantly used in modern 
mathematics. The word set will be understood as synonymous with collection. A 
set is then a collection of certain objects, usually called the elements of the set. 
Sets and their elements will often be denoted by letters of the alphabet. If S 
denotes a set, such a phrase as “let x be an element of S” means the following: 
we wish to talk about an arbitrary element selected from S, an element to be con- 
sidered as fixed throughout the discussion at hand. In order to be able to talk 
about it conveniently, we give it a temporary name x (or whatever other symbol 
happens to be convenient). Thus x here does not denote an element fixed once 
for all in S (unless stated otherwise), but rather a “variable” or “arbitrary” 
element; its meaning is fixed only for the discussion that one has in mind. 

A part of a set S is of course itself a set. A part of S is also called a subset of S. 

We illustrate with some very simple examples—indeed trite examples. They 
merely serve to illustrate how the language is used, and we shall not bother about 
great precision at this point. 


Examete 1 Let S denote the set: of all cards in an ordinary deck. Thus S contains 
52 elements, and we can say that if x is an element of S, then x bears one of the 


2 Binary operations and groups Ch. 1, See. 2 


symbols 2, 3, .. . , 9, J, Q, K, A, and one of the colors red or black. We can 
define a subset F of S by declaring that R shall consist of all elements x in S which 
are red. Here we have imposed a condition on ; it now stands for an arbitrarily 
selected red card in S. That is, the subset RF consists of all hearts and diamonds. 
Tn turn we can define a subset W of R by the condition that an element y of F is 
in W if and only if y bears the number 3. In other words, W consists of the 3 of 
hearts and the 8 of diamonds, A further subset T of S is defined by the condition 
that T shall consist of all elements z in S such that z bears a black diamond. This 
condition is not a priori unreasonable, for someone who plays only bésigze or 
roulette might well be unaware of the composition of a bridge deck. The reader will 
of course be aware that the subset T just defined contains no cards; for convenience 
of language we call T the empty subset of S. 


examee 2 Odd numbers form a subset U of the set N of whole numbers. Let V 
be the subset of U consisting of all elements » such that n? leaves a remainder of 
3 upon division by 4. It is not hard to see that V is empty, but that is not quite 
so obvious as in the case of 7’ in the preceding example. 

One of the most basic notions occurring in mathematics is that of mapping. A 
mapping of a set S to a set T is a rule, or an operation, which assigns to every element 
in S a definite element in T. (The sets S and T need not be different.) Mappings 
are also called functions or, sometimes, operators. Mappings are customarily de- 
noted by letters. If f denotes a mapping of S to T, then the element of T which f 
assigns to an element « of S is often denoted by f(x); one says that the mapping 
f sends (or maps) the element z into the element f(z), and f(x) is called the image 
of z, The notation f: S > T is useful to indicate that f is a mapping from S to T. 

Some further useful notation is as follows: B being a subset of S, f(B) denotes 
the subset of T consisting of all f(w) tor which z is in B; f(B) is called the image of B 
under f. C being a subset of T, f-'(C) denotes the subset of S consisting of all 
elements x such that /(2) is in C. 


EXAMPLE 3. Assigning to every card in a bridge deck S its suit defines a mapping 
of $ to the set of four elements consisting of hearts, diamonds, clubs, spades. 

Assigning to every card in S the king in the same suit defines a mapping f of S 
to itself. Thus f(8D) = KD, f(4D) = KD, ete.; and f(KD) denotes the set of 
all 18 cards in the suit of diamonds. Observe that f-'(3D) is the empty set. The 
mapping f can also be considered as a mapping of S to the subset K consisting of 
the four kings. 


EXAMPLE 4 Assigning to every card in a bridge deck S the corresponding ecard in a 
second deck S’ defines a mapping of S to S’. 
eExamptes Assigning to the radius of a circle the area of that circle defines a 
mapping of the set of positive numbers to itself. 

A mapping f of a set S to a set T is called a one-to-one mapping, or correspond- 
ence, if the following conditions hold: 


Sets and mappings 3 


Figure 1 


"\ The projection, from the eyepoint, 
of the spherical zone S upon the 
plane 7 is a mapping f of S to T. 
If the plane T is tangent to the 
sphere, and the eyepoint lies at the 
end of the diameter through the 


Eyepoint point of tangency, such a mapping 
is called a stereographie projection. 


(a) For every element y in T there is an element z in S such that f(z) = y. 
(b) If x and 2’ are two different elements of S; then f(x) # fix’). 
The first condition asserts that for each element of T there is at least one element 
of S which is sent into it by f; and the second condition asserts that different ele- 
ments of S are not sent by f into the same element of T. Thus, to every element 
in S there corresponds one and only one element in T, and vice versa. 

The mappings of Example 3 are clearly not one-to-one; but the mapping of 
Example 4 is a one-to-one correspondence. 

A one-to-one mapping of a set to itself is sometimes called a permutation, Thus 
the simple eryptographie codes obtained by scrambling the alphabet depend upon 
permutations, or one-to-one correspondences, of the letters of the alphabet. 

If a mapping f: S  T is one-to-one, then we can define a mapping from T to $ 
by assigning to each y of T the unique element x of S that f sends into y. This 
new mapping is usually denoted by f-' and is called the inverse mapping of f. 
Clearly f-? is also one-to-one. 


EXAMPLE 6 Given any set $ whatever, the mapping f from S to S that assigns each 
element of S to itself is certainly one-to-one. This mapping, as trivial as it seems, 
arises often enough to deserve a special name: it is called the identity mapping of S. 
In symbols it is characterized by f(x) = x for all x in S. 

Suppose now that we are given a mapping f of a set S to a set 7 and another 
mapping g of T to aset U. To every element x in S the first mapping assigns some 
element f(x) in 7; to every element y of T the second mapping g assigns an ele- 
ment g(y) in U—in particular to f(x) it assigns an element g(f(z)) in U. Therefore 
the operation that assigns to x in S the element g(f(x)) in U is a mapping from S 
to U. It is ealled the composition of f and g and is denoted by gf, or simply by 
of. Thus ge f(x) = g(f(x)). 

For a “capsule” definition we can say simply that g e f is the operation obtained 
by firat applying f and then g. 


examece 7 If the first mapping assigns to every human being its last name, and 
the second assigns to every proper name its first letter, then the composition of the 


Figure 2. Stereographic projection sends circles on the sphere into circles on the 
plane, except for the great circles passing through the point of tangency; these great 
circles are mapped into straight lines. 


80 

> 

< 

* 
a y 
° 
40 
F 
I of 

° rome ys | 
20 
40 
C} 
180 160 140 120 100 80 co 40 20 0 20 40 60 80 100 120 140 160 180 


Figure 3 Mercator’s projection from his World Map of 1569. Mercator’s delineation 
of the land is shaded. This map, in common with all other geographical maps, 
provides a mapping of (a subset of) an idealized earth’s surface. 


Binary operations 5 


given mappings is the mapping from the set of human beings to the alphabet which 
assigns to every human being his surname’s initial. 


Some of the examples of sets and mappings given above involve sets of real 
objects (cards, ete.). The sets that occur in mathematics always consist of ele- 
ments of an abstract nature. And most often the real point of interest is not the 
nature of the elements in a given set but rather the relationships that happen to 
exist among the elements. 

The notions of set and mappings lie at the very foundationa of mathematics, and 
we wish to avoid completely any discussion of that aspect of things. For that 
reason we have limited ourselved here to explaining a few terms that are nearly 
indispensable. However some of these are taken up again briefly in Chap. 15. 


EXERCISE 


Let k: S + T be a one-to-one mapping, and let #-? be its inverse. What are 
the composite mappings f eh! and h7! « h? 


3. Binary operations 


In this section we discuss a general kind of mathematical operation which includes 
as special cases the familiar operations of arithmetic, as well as many others that 
will be of importance later in the book. 


DEFINITION 3.1 A binary operation in a set S is a rule, or operation, which assigns 
to any pair of elements in S, taken in a definite order, another element in the same 
set 3. 

For the present let us indicate such an operation by +. Thus, if a and } are 
elements of the set S, then the « operation assigns to the pair a, b some element of 
S. We shall denote it by a» 6. The operation also assigns to the pair b, a an 
element, namely 6» a. Observe that the definition tells us nothing about the 
relation of a+ b to b+a. In fact, it tells us so little about » that we could not 
expect to prove any very surprising theorems at this point. 

Let us take time to point out some simple but important things about «. First 
of all, the symbol + used to represent the operation is of no importance, and 
presently we shall use other symbols which are more convenient. Moreover, the 
same set S may be equipped with several different binary operations, as we shall 
soon see. 

The adjective binary is used because the operation produces from a pair of 
elements $ (taken in a specific order) some new element of S. If we select ele- 
ments a, 5, ¢ of S (not necessarily different) there are various ways in which we can 
combine them by the operation +. For example, to the pair a, 6 our operation 
assigns the element a « b, and we can then use * again to combine that element 
with c. We denote the result by (a* 0) *¢. To repeat, this symbol means the 
following: apply * to the pair a, b obtaining a+ 6, and then apply « to the pair 


6 Binary operations and groups Ch. 1, See. 3 


a+*b,c. In like manner we can apply * to the pair a, 6 * ¢, obtaining an element 
which we write as @* (b«¢). The parentheses are necessary to indicate just how 
the elements are to be grouped into pairs. It would make no sense to write down 
simply a * b « ¢ because * can be applied only to pairs of elements. We shall soon 
see, however, that parentheses can be safely omitted under certain circumstances. 
Note that if we change the order of a, 6, c in the foregoing we can write down 
still other combinations, for example, b+ (a»c). Similarly, if a, b, c, d are ele- 
ments of S, we can form a» (b* (¢ * d)) or (a * 4) + (¢ * d) and many others. 
Throughout this book the equality sign ““=" is taken to mean “the same as.”” 
We take occasion here to point out some very important but very simple facts 
about =. Let a and b be elements of the set S we have been discussing, and sup- 
pose that a = 6; that is, the letters a and b denote the same element of S, or in 
other words they are two names for the same element of S. It follows that if ¢ 
is any element of S, then a» c¢ = 0» ¢, for the two sides of this equation are still 
simply two names of the same element of S. Similarly, if a, a’, 6, b’ are elements of 
S such that a+b = a’* 0b’, then a+ 6 and a’ » d’ are two names for the same 
element of S, and so it follows that (a+ b)*e = (a+ 6’) +e for any element 
¢ of S, and also c « (a* ) = ¢« (a’ +b’). We might express this principle briefly 
by saying that “equals ‘starred’ with equals are equal.” Observe, however, that 
this is a simple consequence of the meaning of equality and holds for all binary 
operations. 
Exampte 1 The following are binary operations: Addition of whole numbers; 
‘multiplication of whole numbers; subtraction of whole numbers (by which we 
mean the operation that assigns to a pair of numbers a, 6 their difference a — 5). 


EXAMPLE 2 Let S be a set and M be the set of all mappings of S to itself. The 
composition of mappings defines a binary operation in M, since the composition 
of two mappings of S to itself is again a mapping of S to itself. 

The way in which » operates upon pairs of elements of S can be written down 
in a “table’—at any rate if S does not have an unreasonably large number of ele- 
ments. To give some concrete examples let. us consider a set S consisting of ele- 
ments represented by the lettets p, 9, 1, ¢ (these letters now stand for fixed ele- 
ments of S, not for arbitrary elements, as with a, b, e above). The following tablest 
define binary operations which we denote by X, ©, °, +. 


TABLE 1, FOR X 


a 
Pp q $ q 
q Pp Pp s 8 
r aq po @ior 
8 ro op 8s fF 


+ These tables are for illustrative purposes and have no particular interest heyond that. 


Binary operations 7 


The table is to be read as follows: to find, say, p X ¢ look for the element at the 
intersection of the p-row and g-column. According to the table, p X ¢ = 8; 
similarly, ¢ X p =p, r Xs =7, and so on. Note that p <q and q X p are 
different elements. 


TABLE 2, FOR <> 


po@ tf 8 


pP\ao¢@@ ¢ 
q{@ 9 4 @ 
rlq@o4@o@ @ 
s|]@ @ @ 4 
HerepOqgarOs=+-+=@ 
TABLE 3, FOR © 
qo3roes 
plp @i ts 
q)q pos fF 
r fr os q 
e|s or g -p 


Note that, here, sey = ye x for any two elements <, y selected from p, g, 1, 8. 
A similar statement holds for the operation defined in the preceding table, as well 
ag in the following. 


TAGLE 4, FOR + 


| 

Pp o@ fF 8 
q q s PB 
r ros p 4 
8 s op @ f 


Note that, here, p + x = 2+ =< for any element x selected from p, g, 7, 8 
An analogous statement holds for the preceding example. In both cases p is 
said to be an identity element for the particular binary operation © or + in question. 

By now it should be fairly obvious that many other binary operations can be 
defined in the set S consisting of p, g, r,s. In fact, to define such an operation it 
ig only necessary to fill the 16 blank spaces of a table with the letters p, g, r, s. 
Since there are four choices for each of the blank spaces, there are all together 
418 = 4,294,967,296 possible binary operations in S. The reader will be heartened 
to learn that only two of these are of any real mathematical interest, namely the 
ones given in Tables 3 and 4. The operation defined by Table 2 is clearly very 


8 Binary operations and groups Ch. 1, See. 


trivial in nature. The authors arrived at Table 1 by filling in the 16 spaces at 
random. Tables 3 and 4 are tables of groups, which we shall define presently. 

The use of tables like those above becomes awkward and inconvenient for sets 
§ containing many elements. Most of the sets of importance to us contain an 
infinite number of elements, and for such seta it is wholly impracticable to write 
down a table. We shall soon find other more convenient means of describing 
binary operations. 

Referring back to the “identity’’ element p of Tables 3 and 4, we are led to 
make a general definition: 


DEFINITION 3.2 S being a set with a binary operation «, an element e of S is called 
an identity element for » if e« a =a and ae = a for every element a of S. 

An identity element in S {if there is one) plays a role analogous to that of zero 
for addition and of unity for multiplication. Indeed 0 +¢=a+0 =a for 
any integer a, and so 0 is the identity element for addition of integers. Similarly, 
1-a@ =a-1 =a, and 1 is the identity element for multiplication. 


THEOREM 3.1 Let S be a set with a binary operation ». If there is an identity 
element for + in S, then there is only one. 

Proof. Suppose that e and e’ are both identity elements in S. Then 
by Definition 3.2 we have e « a = a for any element a of S. In particular, 
e+e’ =e’. Similarly from Definition 3.2 we have a*e’ =a for any 
element ¢ of S. In particular, e» e" = e, Hence,e =e. QED. 


EXERCISES 
1. Compute the following expressions from each of the four tables above, 
where + denotes, in each case, one of X, ©, », + 
(@) D*D.g* tT D8, De 
(8) (pep) @er)) «a (p+ (pe(ger))) +4, (8* (p* 8) + 9))* (D* 9). 
2. How many binary operations can be defined in a set of two elements? In 
a set of seven elements? How many of these operations satisfy the commutative 
lawizey =y«cx? 
3. Is there an identity element for the operation x of Table 1? 


4. The associative axiom 


periwiTion 4.1 Let S be a set with a binary operation *. The operation is said to 
be associative if, given any elements a, b, ¢, in S, the following equality holds: 


4a (@edb)+e =a (bee) 


We also say that + satisfies the associative axiom, or law, if (4.1) holds. 

This is a very important axiom, and most of the binary operations which we 
shall encounter will satisfy it. Referring to the four tables of Sec. 3, the reader 
can verify that the last three define associative operations, in contrast with the 


The associative axiom 9 


the first one, which does not: (p K q) Xr = 8 and p X (gq Xr) =@. To give 
another example, division of numbers is a binary operation which is not associa- 
tive. 

We can now prove a simple result which is not merely a restatement of the 
definition. 


Proposition 4.1 If the operation » is associative, then for any elements a,b,c, d in S 
we have 
(fab) *c)ed = (avd) * (e+) 

Proof. Denote a« temporarily by z. Then, since » is associative, 
(w+e)+d =2+(c+d). Replacing z by a+ (another name for the 
same element!) we obtain the desired conclusion. It is of course necessary 
to enclose a * in parentheses, as explained in the preceding section. 


‘An examination of Proposition 4.1 and Exercises 1 and 2 below discloses that 
the associative axiom permits us to shift parentheses in a step-by-step fashion and 
suggests the following very general result: 


THEOREM 4.2 Let S be a set with an associative binary operation ». Then if any 
combination of elements of S is formed by means of repeated application of the opera- 
tion +, the result does nol depend upon the grouping of the elements involved. In other 
words, the way in which parentheses appear in the combination does not affect the 
result. 

The assertion of this theorem is quite easily proved in any given special case 
where the number of elements involved is relatively small, and the truth of the 
theorem soon becomes rather obvious. A complete proof of the theorem, however, 
is rather involved and is not particularly instructive. Moreover it requires the 
use of mathematical induction, which will not be treated until the next chapter. 
We shall omit the proof of Theorem 4.2 in order to get on quickly with more inter- 
esting matters. Nonetheless the theorem is of great usefulness and permits an enor- 
mous simplification of notation. Thus, if @, b, ¢, d are any elements in S$ and if 
the operation « is associative, then the expressions 


((a+b)*c)ed (a+b) * (c#d) (a (b» (e* d))) 


all represent the same element (Proposition 4.1 and Exercise 1). Hence no 
ambiguity can arise if we denote that element simply by a+ b«c+d. We can 
similarly omit parentheses in combinations of more elements. For the remainder 
of this chapter, however, we shall consistently retain parentheses in order to em- 
phasize the use of the associative axiom. 


At the end of Sec. 3 we defined what is meant by an identity element for a 
binary operation. Closely connected with that is the notion of inverse element: 


DEFINITION 4.2 Let S be a set with a binary operation «, and suppose that S contains 
an identity element e for that operation (e is necessarily unique, by Theorem 3.1). 


10 Binary operations and groups Ch. 1, See. 


Tf a is an element of S and if there is an element a’ in S such that aa’ =e and 
a’ «a =, then a’ is called an inverse element of a. 


exampce Referring to Table 3, Sec. 3, p is the identity element; r has an inverse, 
namely, r itself. Similarly for g ands. For Table 4, p is again the identity; ¢ has 
the inverse s, p has the inverse p. 
THEOREM 4.3 Let S and * be as in Definition 3.3, and suppose furthermore that » is 
associative. Then no element of S can have more than one inverse. 
Proof. Let a’ and a” be inverse elements for some element a of S. 
Then by definition we have a+ a’ = a’* a =eanda*a” =a «a =e. 
By the associative law, (a * a)* a’ =a» (as a’). Putting e fora’ *a 
on the left and for a» a’ on the right, we get ea’ =a’ e, and so 
a’ = a", by Definition 3.2. QED. 
Remark. In virtue of this theorem we can speak of the inverse element of a (if 
there is one and if * is associative). For example, —5 is the inverse of 5 (and vice 
versa) if the binary operation is taken to be addition of numbers; one-fifth is the 
inverse of 5 if multiplication is the binary operation. 


THEOREM 4.4 Let S be a set with an associative binary operation * and containing 
an identity element e for «. If a’ is the inverse of an element a, then a is the inverse 
of a’. If further b! is the inverse of an element b, then b! «a! is the inverse of a « b. 
Finally, ¢ is its own inverse. 

Proof. The first assertion follows at once from the symmetry of Defi- 

nition 4.2 and from Theorem 4.3. To prove the second assertion, we 
have (using the associative axiom) 
(b! # a’) * (a@*b) = b'* (a » a) #b) = Bw (end) =O DK 
Similarly, (a + b) + (b' « a’) = e, and so the element 6’ * a’ has the prop- 
erties required by Definition 4.2. Therefore b’ » a’ is the unique inverse 
of a» b, by Theorem 4.3. Finally, ¢ « e = e, so that ¢ is its own inverse, 
by Definition 4.2 and Theorem 4.3. @.E.D. 


THEOREM 4.5 (Cancellation theorem) Let S be a set with an associative binary 
operation * and containing an identity element e for +. Let ¢ be an element of S which 
has an inverse. Thenifa+e =b*core*a = cb for elements a, d of S, it follows 
that a = b. 
Proof. Let c’ be the inverse element of c. Thus ¢ * c’ 
Assuming now that a*e = b+ ¢, we have, by “starring” with ¢’ on both 
sides, (a * c) +c’ = (b» c) « ¢’ (recall the discussion of equality in Sec. 3!). 
Using the associative law we get a+ (ere) =b* (crc), orate = 
b»e. Therefore a = 6, by definition of identity element. The argument 
for the other case c* a = c+ bis similar. Q.E.D. 


deere. 


The theorem tells us that an element appearing on both sides of an equation 
can be “canceled out,” provided that it has an inverse and provided that it occurs 


The commutative axiom. rei 


on both sides either on the extreme left or extreme right. Observe that Theorem 
4.5 tells us nothing about the possibility of cancellation in such an equation as 
a»*c =cxb. In some systems which we shall study later, cancellation is legiti- 
mate even for elements which do not have inverses. 


EXERCISES 
1. If the binary operation + is associative, prove that. 
((a*b)sc)*ed =a» (b+ (c+ d)) 
for any elements a, 6, ¢, d. 
2, Similarly, prove that 
((a*b)*e)*«d)*e =a ((b* ce)» (d* e)) 
for any elements a, 5, ¢, d, e. 

3. Show that the associative axiom is satisfied by the operations © and 
defined in See. 3. 

4, Write out the definition of an associative binary operation entirely in words 
without the use of any symbols. (This exercise should convince almost anyone 
of the great utility of using letters to represent arbitrary, or variable, elements.) 

5. Of the binary operations of Examples 1, 2, 3, Sec. 3, which are associative 
and which are not? 

6. Write down the inverses for each of the elements p, q, r, # for the binary 
operations of Tables 3 and 4, Sec. 8. Do any of these elements have inverses for 
the operations of Tables 1 and 2? 

7. Show by an example from Table 1, See. 3, that cancellation is not valid as a 
general rule in that system. What about the systems defined by Tables 2 and 3? 


5. The. commutative aciom 


‘As we have just seen, the associative axiom allows us to omit parentheses in com- 
binations of several elements of S. However, the order in which the elements 
occur is in general quite essential, For example, we will have in general 
(a*b)*e # (b¥a)#c. Nevertheless many important binary operations are in- 
different to the order in which the elements are written. More precisely, we give 
the following definition: 


DEFINITION 5.1 Lei S be a sel with a binary operation *. The operation is said to 
be commutative #f, given any elements a, b in S, the following equality holds: 


5.1 asb=bea 


We also say that » satisfies the commutative axiom, or law, if (5.1) holds. 
Note that this definition is completely independent of the associative axiom, 
discussed in Sec. 4 (cf. Exercise 8 at the end of this section). 


exampLe 1 Addition of whole numbers is a commutative binary operation. 


12 Binary operations and groups Ch, 1, Sec. 6 


Exampce 2 Subtraction of whole numbers is not a commutative binary operation 
(a — b # b — ain general). 
We do not count the commutative axiom as one of the main axioms of this chap- 
ter, and we shall not assume that it is satisfied unless we explicitly say so. 
Exercise 2 below and Theorem 4.2 suggest the following general theorem. The 
remarks concerning the proof of Theorem 4.2 are equally applicable here, and we 
shall omit the proof, 


THEOREM 5.1 Let S be a set with an associative and commutative binary operation «. 
Then if any combination of elements of S is formed by means of repeated application 
of +, the result depends neither upon the grouping of the elements involved nor upon 
their order. 


EXERCISES 

1. Of the binary operations described in the examples and tables of Sec. 3, 
which are commutative and which are not? 

2, Let the binary operation » in a set S be associative and commutative, and 
let the letters in the following formulas denote elements of 8. Prove the following 
formulas. (Do not omit parentheses; give complete proofs.) 

(a) a» (bee) = b* (a*c) (b) a» (bec) = eajre 
(e) e* (a* 6) = b* (axe) (d) ((a« b) +c) *d = (de) « (bea) 
{e) (ab) + (e« d) =d« (c* (b+ a)) 

3, Find a commutative, but not associative, binary operation in a set consisting 
of two elements a and 6. In this same set find an associative, but not commutative, 
binary operation. 

*4. Let * be a binary operation with identity element in S. Prove that the 
operation is both associative and commutative if and only if it satisfies the so- 
called “commutassociative” law 

(a* b)* (ed) = (awe) * (b» d) 
for any elements a, b, ¢, din S.t 


6. Groups 


We now come to the second main axiom of this chapter. By adding to the as- 
sociative law one more simple axiom, (2) below, we arrive at the definition of the 
notion of group. Groups are mathematical systems of great importance, and they 
will be basic throughout the rest of the book. 


DEFINITION 6.1 A sef G with a binary operation « is called a group if 
(1) The operation « is associative. 
(2) G contains an identity clement for the binary operation; and every element 
in G has an inverse in G for the operation «. 


t Asterisks preceding exercise aumbers indicate the more difficult exercises. 


Groups 13 


It isan easy matter to verify that the operations e and + given by Tables 3 and 
4, Sec. 3, satisfy these two axioms, whereas x and < of Tables 1 and 2 do not. 
The set consisting of the elements p, ¢, r, s, together with either of the two binary 
operations ° or +, is a group; the two groups are clearly quite different. 

As further examples we give below the tables for two groups based on a set of 


six elements p, 9, 7, 8, tu: 


TABLE 5 TABLE & 


e 
3 
2 


po4@ or 6 ft 


plp @ or 8 t plr op ¢@ t wos 
a]}q or 8s t uw @ a|po@ rf s tt w 
rir os t « p 4@ 7 |a@ or p ew 8 8 
es] s 6 « p @ or es [uw is t$ @ +r @ 
t)tou p @?re t}s ¢ ww p qr 
ule op a rose t ult ow os +r p @ 


Returning to Definition 6.1, observe that a group involves a set of elements 
together with a binary operation satisfying the two axioms stated above, But 
we shall often use the same letter to denote both the set itself and the group. 
The binary operation is usually called the group operation. 

From Theorem 3.1 it follows that the identity element of a group G is unique. 
From Theorem 4.3, the inverse of any element of G is unique. In this section 
we shall denote the identity element by e; the inverse of an arbitrary element a 
will be denoted by av, 


THEOREM 64 Let G be a group, with binary operation «. Ifa+e =aore+a =a 
for some element a in G, then ¢ must be the identity element ¢ of G. Ifaxc =bee 
ores a =e+binG, thena = 6. Ifaxd =ecorb«a =e, then a and b are inverses 
of each other. 

Proof. Let a~! be the inverse element of a. If a+e = a, then clearly 
a“'* (qx) = at+a =e, and so (a-!*a)s¢ =, whence exc =e, 

The arguments for the other assertions are similar. 
QED. 


or finally ¢ 


THEOREM 6.2 Let G be a group with operation ». For any two elemenis a and b of 
G the equations a» x = b and yx a = 5b have unique solutions for the unknowns 
xand y, namely x = a7!» band y =b+a7. 

Proof. The solutions are necessarily unique. For if a* 2 = and 
axz’ =), thena* x =a» 2’, whence x = 2’, by the preceding theorem. 
Similarly for the other equation. On the other hand, it is trivial to verify 
that the solutions given above are correct. For example, a* (a7! + 6) = 
{a*a™')*b =e*b6 = b, and similarly for the other. Q.E.D. 


REMARK 1. If the operation « satisfies the commutative axiom, then the two 
solutions x and y above are clearly identical. 


th Binary operations and groups Ch. 1, Sec. 6 
The following theorem gives a very important source of groups: 


THEOREM 63 Let E be an arbitrary set, and let M denote the set of all one-to-one 
mappings of E to itself. Then M, with composition of mappings as a binary opera- 
tion, is a group. Its identity element e is the identity mapping of Ex the inverse of an 
element f of the group M is the inverse mapping f-'. 

Proof. Let f, g be two elements of M, that is, two one-to-one map- 
pings # — E. By definition (Sec. 2), the composition fo g is the mapping 
that sends an arbitrary z in E into the element f(g(x)). That is, f° g(x) = 
f(g(e)). It is obvious that f g is again a one-to-one mapping and there- 
fore is an element of M. Consequently, composition of mappings is a 
binary operation in M. It is associative; for if f, g, h are in M, and if x is 
any element of E, then f» (go h) sends x into the element f(g k(x)) = 
f(g(k@))), and (feg)eh sends x into fe g(k(x)) = f(g(h(z))). Thus 
fe(goh) = Gog)eoh. To verify axiom (2) of Definition 6.1, let ¢ denote 
the identity mapping of E. That is, e(z) = 2 for allzin EZ. Plainly ¢ is 
an element of M, and for any fin M we have fo e(x) = fie(z)) = f(z) and 
eo f(z) = e(f(x)) = f(x). Hence eof = fee =f, showing that ¢ is the 
identity element for the binary operation in M. Finally, if f—' is the 
inverse mapping of f, then by definition we have f(f-'xr)) = 2 and 
f"U(@)) = « for any x in E. Therefore, fo f-'! = fof =e, showing 
that f has an inverse for the binary operation in M. Q.E.D. 


exampLe Let E consist of three elements a, b, c. It is easily seen that there are 
precisely six one-to-one mappings of E to itself, Let ¢ denote the identity map- 


ping; let p denote the mapping of E represented by the symbol ¢ 8 a by 
ca 


which we mean that p sends an element of the top row into the element of the 
bottom row directly beneath it. Thus p(a) = c, etc. Similarly put 


re (2%? s=(008 = (Ors ua(0Re 
~A\bea bae, aed eba 
Then we have, for example, re ¢(a) = r(t(a)) = r(a) = b, re é(b) = r(E(b)) = 
r(c) = a, and ro (e) = r(t(e)) = r(b) = ¢, showing that rot = s. It is a routine 


matter to verify that the operation in the group M of permutations of E is given 
by Table 6 above. 


It may happen that a certain subset of elements in a group is itself a group, 
and we now give a definition bearing on this situation. 


DEFINITION 6.2 Let G be a group with operation +, and let H be a non-empty subset 
of G. Then H is called a subgroup of G if for any elements a, b in H the elements 
a”, b- and a* bare also in H. 

The definition requires that + applied to a pair of elements of H must produce 


Groups 15 


an element in H. Hence, + gives a binary operation in H. Since H contains at 
least one element a, hence also a-, by assumption, it follows that H contains 
a+*a7 =e, It is easy to verify that H, with the operation *, is a grour. 


THEOREM 6.4 Let H be a subgroup of a group G. Then G and H have the same 
identity element. Furthermore, the inverse of an element of H is the same, whether 
the element is considered as being in H or in the larger group G. The identity element 
of any group forms by itself a subgroup. 
Proof. The first two assertions follow from Theorem 6.1. The last 
assertion is trivial. @.E.D. 


Some examples of subgroups are readily found in the groups defined by Tables 3 
and 4, ‘The following indicate subgroups of the group given by Table 3: 


Pp po po pos 
p\ Pp p|Pp @ p|pior plp s 
a|q ep r lr op s [se Pp 


The following indicate subgroups of the group given by Table 4: 


P 


B P pl Bp oft 
r rp 


As a further example, let E be any set and let E’ be a subset. Let M be the 
group of all permutations (i.e., one-to-one mappings of E to itself), as in Theorem 
6.3. Let M’ denote the subset of M consisting of all mappings f such that f(x) = x 
for every element z of E’. It is quickly seen that M’ is a subgroup of M. 


REMARK 2. G being a group, G itself should be counted among the subgroups of G, 
for it clearly satisfies Definition 6.2. In this context it is sometimes referred to as 
the improper subgroup, all other subgroups being called proper. 


EXERCISES 


1. Give an example of a group containing one element; give an example of a 
group containing three elements. 
2. Let Y denote the binary operation of Table 5. Solve the following equations 
for x and y: 
rYue=t yOrat 
rYuar yQrer 
Solve the same equations for the binary operation of Table 6. Are these opera- 
tions commutative? 
3. Find a rule permitting you to determine rapidly from examination of a table 
whether it defines a commutative binary operation. 


16 Binary operations and groups Ch. 1, Sec. 7 


4, The group of Table 6 contains a subgroup of one element and a subgroup 
of three elements. What are those subgroups? Find subgroups of one, two, and 
three elements of the group of Table 5. 

5 Verify that the following is the table of a group: 


po@or 
Pp q r 
q a rr Pp 
rp ¢ 


6, Let E be a set of two elements. Write out the table of the group M of one-to- 
one mappings E — E (see Theorem 6.3). 

7. Let E be a set of four elements. Write out the table for the group M of 
one-to-one mappings of E' to itself (as in Theorem 6.3). Find at least three proper 
subgroups of M, using the observation preceding Remark 2 above. 

8. Let G be a group with operation *, and let H be a non-empty subset. Show 
that H is a subgroup of G if and only if a~' + b is in H whenever a, b are in H. 

9, Let H, K be subgroups of a group G, and denote hy H K the subset of @ 
consisting of the elements in both H, K. Prove that H ( K is also a subgroup 
of G, as well as of H and K. 

10. Let G be a group. Show that the set of elements common to any family 
of subgroups is itself a subgroup of G. 

11. Let @ be a group with operation «, and let ¢ be a fixed element of G. Show 
that the mapping which sends an arbitrary element « of G into ¢ + x is a one-to-one 
mapping of G to itself (called left translation by ¢). Prove that right translation 
by ¢, defined by x — x + ¢, is also a one-to-one mapping of G to itself. 

12. Consider a square S in the plane, and let M denote the set of rotations which 
bring S into coincidence with itself, Show that M is a group of four elements 
(composition of mappings being the group operation), and write out its table. 
What do you get if S is replaced by an equilateral triangle? 

*13. Let T be a regular tetrahedron (four-sided solid). Let M denote the set of 
rotations which bring 7 into coincidence with itself. Show that M (with composi- 
tion of rotations as operation) is a group containing 12 elements. 

14. What are the identity elements for the groups of Tables 6 and 6? Write 
down the inverses for all six elements (for both tables). 


7. Isomorphisms and homomorphisms 


Let us return for a moment to the three subgroups 


p q Pp r P s 
ple o@ p|lpor p|pios 
ala? rp e ls Pp 


Isomorphisms and homomorphisms 17 


of the group of Table 3. It is plain that these groups are essentially the same. 
Their elements are different, but their tables are the same, apart from the particular 
symbols used. The three groups are thus interchangeable, in much the same way 
that two automobiles of the same model, or two decks of cards, are interchangeable. 

The groups above are said to be isomorphic (literally, of the same form). More 
generally, a group G and a group G’ are isomorphic if the table for G is transformed 
into the table for G’ by a suitable substitution of elements. For example, the 
substitution of r for ¢ in the first table above transforms it into the second. Since 
we are about to abandon the use of tables, we formulate a definition which does 
not involve them. 


DEFINITION 7.1 Let G and G’ be groups with group operations « and <>, respec- 
tively. These groups are said to be isomorphic, in symbols G & G’, if there exists a 
one-to-one mapping f from G to G’ such that 


ma faed) = fa) Of) 


for alla,b in G. Any such mapping 1s called an isomorphism. 
In short, f is required to be “compatible” with the group operations, for (7.1) 
says that f sends a « } into a’ > 6’, where a’ = f(a) and b’ = f(b) are the elements 
of @ corresponding to a,b in G. Hence f, applied to every element in the table for 
G, simply carries the table for G into that for G’. 
It is clear that the inverse mapping f-! from G’ to G has the same property. 
That is, 


I72@ Sb) = fa") « F710) 


Therefore f—' is an isomorphism from G’ to G. 
It may be possible to find many isomorphisms from one group to another, or 
from a group to itself. 


Remark. The groups of Tables 3 and 4 are not isomorphic, as can be seen by 
comparing their subgroups. For it is clear that an isomorphism from one group to 
another must carry subgroups of the first group into subgroups of the second. 


Mappings which are compatible with group operations without necessarily 
being one-to-one are of great importance. They are called homomorphisms: 


DEFINITION 7.2 Let f: GG! be a mapping of one group to another, with group 
operations * and <>, respectively. Then f ig called a homomorphism if 
12 flaw b) = fla) > f(b) 


for alla, b in G. 
This differs from the preceding definition only in that the one-to-one require- 
ment has been dropped. An isomorphism is thus a special kind of homomorphism. 


18 Binary operations and groups Ch. 1, See. 7 


EXAMPLE 1 The whole numbers (positive, negative, and zero) with the binary 
operation of addition form a group, and the mapping f defined by f(a) = 2a isa 
homomorphism of that group to itself. 


exampte 2 Let H be a subgroup of a group G, and define f; H + G by f(x) =x 
for any xin H, Then f is a homomorphism. 


exampce 3 If G, G’ are groups, then the mapping that sends every element of G 
into the identity element of G’ is a homomorphism. 


The following two theorems about homomorphisms are basic: 


THEOREM 7.1 Let f be a homomorphism from a group G io a group G’. Then f maps 
the identity element of G into the identity of G’; and if f maps an element a into a’, 
then it maps the inverse of a into the inverse of a’. In symbols, 


f(e) = e = identity of G 
flan) = fay 

Proof. Denote the group operations in G, G’ by « and <>, respectively. 
For any element a in G we havea = e « a, whence f(a) = f(e* a), Apply- 
ing (7.2) to the right-hand side, we obtain f(a) = f(e) <> f(a). From 
Theorem 6.1 applied to the elements f(a), f(e) in G’, we conclude that 
f(e) =’. To prove the second part, we have a* a-! =e in G, and so 
f(a* a7) = f(e). Using (7.2) on the left and the result just obtained on 
the right, we get f(a) © f(a-') = e’. From Theorem 6.1 in G’ we see that 
f(a") must be the inverse of f(a). @.E.D. 


THEOREM 7.2 Let f be a homomorphism from a group G to a group G’. If H isa 
subgroup of G, then f(H) is a subgroupt of G’. If K is a subgroup of G’, then f(K) is 
a subgroup of G. 
Proof. By definition, f(H) consists of all elements f(x) with x in H. 
Hence, if a’ is in f(#), then there is an element a in H such that f(a) = a’. 
But a~ js also in H, by definition of subgroup, and so f(a~') is also in f(H). 
By the preceding theorem, f(a“) = a’-!, Therefore the inverse of any 
element in f(H) i8 also in f(H). Ii 8" is a second element of f(H), then 
there is 2 in H such that f(b) = 6’. Then a+ 6 is also in H (notation as 
above), since H is a subgroup, and so f(a  ) is accordingly in f(H). By 
(1.2), f(a» b) =a’ Sb’. This proves that f(H) satisfies Definition 6.2 
ing’, 
To prove the second part, recall that f-!(K) is the set of all z such that 
f(z) isin K. Accordingly, if a is in f(K), then f(a) is in K. Its inverse 
f(a)! must also be in K, by definition of subgroup, and by the theorem 
above we have f(a~) = f(a)-".. That is, f(a~) is in K, and so a~ must be 
in the set f-1(K). Hence, the inverse of any element in f-!(K) is also in 
that set. Now if b is another element of f-(K), then f() isin K. There- 
+ This notation is explained in See. 2 and is recalled briefly in the course of the proof. 


Isomorphisms and homomorphisms 19 


fore f(a) <> f(b) is in K, since K is a subgroup of G’. From (7.2) it follows 
that f(a « 6) is in K; that is, a* bis in f-'(K), Therefore, f-'(K) satisfies 
Definition 6.2 in G. QED. 


Remark. Taking H = G in the theorem, we see that f(G) is a subgroup of G’. 
It is called the image of f. Now the identity element e’ of G’ forms a subgroup of 
G’, Taking that for K in the theorem, we see that f-'(e’) is a subgroup of G; it 
consists of all elements which are mapped into e’ by f. The subgroup f-'(e’) is 
called the kernel of f. 


THEOREM 7.3 Let f be a homomorphism from a group C toa group G'. Then f maps 
distinct elements of G into distinct elements of G! if and only if the kernel of f consists 
of the identity element alone. 

Proof. The kernel of f, being a subgroup of G, must certainly contain 
the identity element e of G. If the kernel contains another element a, then 
by definition we have f(a) = ¢’, and so f(a) = f(e), showing that f does 
not map distinct elements of G into distinct elements of @’. 

Suppose now that the kernel of f consists of ¢ alone. That is, only ¢ is 
mapped by f into e’. If f(a) = f(b), then f(a « b“") = e’; for by (7.2) the 
left member here is equal to f(a) <> f(6-"), which, by Theorem 7.1, is equal 
to f(a) <> f(b“! =e’. ‘Then from our assumption it follows that a + b-? = 

QED. 


e, whence a = 


coronary A homomorphism f: G — G' of two groups is an isomorphism if and 
only if the image of f is equal to G’ and the kernel of f consists solely of the identity 
element of G. 

For this is precisely the condition for f to be one-to-one. 


EXERCISES 

1. Find ¢~, (s © 7, (uv! + s+ f)~!, un « t- for the group defined by Table 5: 
Compute the same expressions for Table 6. 

2, Let G be a group, and let f be the mapping of G to itself defined by f(a) = a-' 
(inverse of a). Show that f is one-to-one, and prove that it is an isomorphism if 
and only iff the group operation in G is commutative. Exhibit this isomorphism 
explicitly for the groups of Tables 4 and 5. 

3. Let f be a homomorphism from a group G to a group G’, and let g be a homo- 
morphism from G’ to a group G’. Prove that the composition gf is a homo- 
morphism from G to G’”’. 

4, Let f and g be homomorphisms of a group G to a group G’. Denoting the 
operation in G’ by X, define a mapping 4 of G to G’ by h(a) = f(a) x g(a). Prove 
that # is a homomorphism if x is commutative. 


} The expression “if and only if” means that the statements standing on either side of it are 
equivalent, Here you must show that, if f is an isomorphism, then the group operation is 
commutative. And, vice versa, you must show that, if the group operation is commutative, 
then f is an isomorphism. 


20 Binary operations and groups Ch. 1, See. 8 


5. Let a be an element of a group G with operation «, and let 7, denote the 
mapping of G to itself defined by T,(2) = a+ x (called left translation by a; see 
Exercise 11, Sec. 6). Show that all such mappings 7’. form a group G’ with com- 
position as group operation, and show that the mapping G — G’ defined by a > T, 
is an isomorphism. 


8. Restatement of the group axioms 


For convenience we repeat here the axioms for a group, and at the same time we 
introduce some less cumbersome notation. Namely, we shall indicate the group 
operation by writing simply ab instead of a* 6 ora <b, etc. The element ab is 
sometimes called the product of a and b. 

A group G is a set equipped with an operation which assigns to each pair of elements 
a, b in G an element ab in G, subject to the following conditions: 


a1 a(be) = (abje for any elements a, b,c in G. 

BZ G contains an element ¢ such that ea = ae = a for every elemeni a in G. 

8.3 For every element a in G there is an element in G, denoted by a-, suck that 
aa =aa =e. 


If the group operation satisfies the commutative axiom 
a4 ab =ba for any elements a, b inG 


then the group is said to be commutative, or abeléan.t 

With the notation just introduced, the group is said to be written multiplica- 
lively, or to be a multiplicative group. Naturally this has nothing to do with the 
group; it is merely a brief way of announcing the kind of notational conventions 
one is about to employ. 

We point out the following important rules for inverses: 


el=e 
88 (ay =a for any a inG 
(aby) = ble for any a, bin G 


These equations merely restate Theorem 4.4 applied to the case at hand. 
If Gand G’ are two groups, both written multiplicatively, then a homomorphism 
f from G to G’ is a mapping such that 


86 f(ab) = fla) f(b) 


according to (7.2). 

Instead of the multiplicative notation just described, the group operation in an 
abelian group is often indicated by +. When a + d is used instead of ab, we say 
that the group is written additively, or that it is an additive group. In this case 
the identity element is customarily denoted by 0 instead of by e, and the inverse 
t After the Norwegian mathematician N. H. Abel (1802-1829). 


Systems with two binary operations: rings, integral domains, fields at 


of an element a is denoted by —a instead of a—!. Equations (8.1) to (8.6) become 
the following in the additive notation: 
a7 at(b+te)=(@+b)+e 
88 a+0=a 
89 a+(-a) =0 
8.10 a+b=b+a 
-0=0 
Sat —(-a) =a 
—(a +6) = -a + (-8) 
saz fla +b) = fla) + £0) 
Finally, a combination ¢ + (-d) is written simply ¢ — d. Equation (8.9) and 
the last equation of (8.11) can be written as 
B43 a-a=0 -(a+b) = -a-b 
The equations of Theorem 7.1 become 
Be f(0) = 0 S(-a) = -f(a) 


Many important theorems about groups depend upon certain properties of the 
integers. Groups will occur in all the chapters in connection with various mathe- 
matical systems. But in Chap. 10 we shall return briefly to the subject of groups 
considered by themselves, rather than as ingredients in more complex systems. 


EXERCISES 
1. Let G be an abelian group with the additive notation. Prove the following 
identities for elements in G: 
(a) —(-a—b) =a+b () -—(@—b)=-atb=b-a@ 
()a-O=a @ ( —b)+ 0-4) =e-a 
(e) @ —b) -@—-b) = 
2. Rewrite all the identities above in the multiplicative notation. Do they de- 
pend upon the commutative law? 


-a 


9.+ Systems with two binary operations: 
rings, integral domains, fields 


Whole numbers can be combined by means of the familiar operations of addition, 
subtraction, multiplication, and in certain cases, also by division. There appear 
to be four binary operations here, but we shall soon see that two of them (sub- 


+ This section consists of the material in Sec. 2, Chap. 2, and Sec. 2, Chap. 8; itis excerpted 
here for the convenience of readers who wish to pass directly to the study of linear algebra 
in Chap. 8. 


22 Binary operations and groups Ch. 1, Sec. 9 


traction and division) are easily defined in terms of the other two (addition and 
multiplication). We wish to investigate the relation between the latter two 
operations, and in order to achieve a setting which will also serve us for the 
rational-, real-, and complex-number systems, we shall begin by studying briefly 
abstract systems with two binary operations. In this and later chapters, we shall 
encounter many important examples of rings, which we now define. 


DEFINITION 3.1 A set A is called a ring if it is equipped with two binary operations 
+ and X such that the following axioms hold: 
(1) A, with the operation +, és an abelian group. 
(2) The operation x is associative, 
(3) A contains an identity element ¢ for the operation X: 
eXa=axXe=a for any element a of A. 
(4) ax +c) =@ Xb) + Xechand (6 +¢) Xa = (6 Xa) + Xa) 
for any elements a, b, c of A (distributive axiom). 
We shall encounter many important examples of rings. The identity element e 
is often denoted by 1. 

According to Definition 6.1, A must contain an identity element for the opera- 
tion +. We will denote this identity element by 0; thus a +0 =a. And every 
element a has an inverse —a with the characteristic property @ + (—a) = 0. 
Recall that a — 6 stands for a + (—b). We calla + 0 the sum of a and 6, a — 5 
their difference, and the operation + addition. 

For the present we do not require that the second operation x be commutative, 
although that assumption will soon be added. (If xX happens to be commutative, 
then we call A a commutative ring.) We shall call the operation x multiplication; 
the element a X b will be called the product of a and 8, and to simplify the notation 
we shall generally write it as a- 6 or ab. Axiom (3) states that A contains an 
identity element ¢ for multiplication; ¢ is usually called the unit element of the ring. 
Axiom (4) supplies the only connection between the two binary operations + 
and X. 

Note that the introduction of the terms “addition” and “multiplication” for 
the operations of A is purely a matter of convenience. We do not mean to imply 
that the elements of A necessarily have any connection with numbers, or that the 
two operations + and x have any connection with addition and multiplication of 
numbers, except insofar as axioms (1) to (4) hold in both cases. 


NoTaTION In order to minimize the use of cumbersome parentheses we shall adopt 
the following well-known convention: in any expression without parentheses in- 
volving combinations of elements of A by both + and x, we understand that the 
X operations are to be performed first, and then the + operations. For example, 
ab + ¢ stands for (ab) + ¢ [not a(b + ¢)!); abe — a’b’ + ed stands for (abe) — 
(a’b’) + (ed). Thus the distributive axiom can be written simply as 


a(b +c) = ab + ae 
(@@ + cja = ba + ca 


Systems with two binary operations: rings, integral domains, fields 23 


Since both + and X are associative, we do not need parentheses to indicate 
grouping of terms, although we still need them of course if the order in which the 
operations are to be performed is not covered by the above convention, as in 
a(b +c). The expression abe stands for either a(be) or (ab)e, as explained in 
Sec. 4. We now prove some very useful theorems which are true for any ring A. 


THEOREM 9.1 The unit element of a ring A is unique. 
This has already been proved in Theorem 3.1. 


THEOREM 9.2 a-0 = 0-a = 0 for any element a of A. 
Proof. For any element 6 in A we have b + 0 =, and therefore 
(6 +0}-a =b-a. On the other hand, by axiom (4), (8 +0)-a@ = 
b-a+0-a,andsob-a+0-a=b-a. Adding —(b- a) to both sides, 
we obtain 0-a = 0. 
Similarly, a@- (6 + 0) =a-6+a-Oanda-(6+0) =a-b; 800-5 + 
a-Q=a-b. Adding —(a- 6) to both sides, we obtaina-0 = 0. 


THEOREM 9.3. (—a)+b = a+ (—b) = —(a-b) for any elements a, b of A. 

Proof. We have a + (a) = 0. Taking the product with 6 on the 
right of both sides of this equality, we obtain (a + (—a))-b =0-d. 
Using axiom (4) on the left-hand side and Theorem 9.2 on the right, we 
obtaina-5 + (—a)-b = 0. Hence, by uniqueness of inverses, (—a)-b = 
—(a-b). 

The proof that a-(—8) = —(a-b) is quite similar and is left as an 
exercise. 


‘The results of Theorem 9.8, Exercise 2 and the rule —(—a) = a, obtained in 
See. 8, embody all the familiar rules of signs used in algebraic computations. 
These rules are thus seen to be necessary consequences of simple assumptions 
about the binary operations. 

From Theorem 9.3 it follows that there is no ambiguity in writing —ab for 
—(ab), and we shall usually use that simpler notation. 


DEFINITION 9.2 A being a ring, lel B be a subset of A. Then B is called a subring 
of Ait 
(i) B contains the unit element ¢ of A. 
(ii) For any two elements a, b of B the elements a + 6, a — 6 and ab are also 
in B, 
It is easy to verify that under these conditions the subset B, equipped with the 
two binary operations + and x in A, is itself a ring. 
Just as with groups in this chapter, we may study mappings of one ring to 
another which are “compatible” with the two operations. 


DEFINITION 9.3 A mapping f of one ring A to another ring A’ is called a homo- 
morphism (or, more precisely, a ring-homomorphism) ¢f, for any elements a and b 
of A, the following equations hold: 


26 Binary operations and groups Ch. 1, See. 9 


fa + b) = f(a) + f(b) 

flab) = fla) - fe) 
Let us compare this new notion of homomorphism to that already defined in 
Sec. 7. If we ignore the multiplication in the rings A and A’, we may consider 
them as abelian groups. The first equation above is then precisely the require- 
ment that f be a homomorphism of A to A’ (considered as groups). Therefore, 
if 0 denotes the zero element of A, as well as that of A’, we may conclude from 
equations (8.12) and (8.14) that 

£0) = 
4 f(-a@) = —fla) 

f(a — b) = fa) — fey 
Note that f does not necessarily satisfy f(e) = e’, where e’ is the unit element of 
A’. Indeed the mapping defined by f(a) = 0 for every element a of A is a ring- 


homomorphism. 
Many of the rings which we shall study have several additional properties, and 


it is convenient to introduce 


DEFINITION 3.4 A ring A is an integral domain éf it satisfies the following axioms: 

(5) The X operation is commutative. 

(6) Ifa, 6, ¢ are any elements of A, with e 0, and if ac = be, thena = 6. 
Axiom (6) is naturally called the cancellation axiom, or law. 

Integral domains are clearly special cases of rings, since they are rings which 

satisfy certain additional conditions. 
THEOREM 3.4 A commutative ring A is an integral domain if and only if for any 
elements a, b of A we have ab ¥ 0 unless a = 0 orb = 0. 

Proof. Suppose first that A is an integral domain, so that the can- 
cellation law holds in A; and suppose that ab = 0, with saya #0. We 
can write our equation as ab = a - 0, and so from the cancellation law 
there follows 6 = 0. 

Conversely, suppose that a product in A cannot be zero unless one 
of the two factors is zero. We show that the cancellation law holds. 
Namely, if ae = be, then ae — be = 0, or (a — b) - ¢ = 0. Ife #0, then 
by our assumption the first factor a — bis zero. That is,@ = 6. Q.E.D. 


EXERCISES 


1. Prove that a ring contains only one unit element. 
2. Prove that, if a, 4, c, and d are elements of a ring A, then 
(a) (-a)-(-8) =a-b 
(b) (-e)-a =a-(-e) = -a 
(ec) (—a)- (—6)-(~e) = —(a-b-e) 
@) (a+b). +a) = ae + be + ad + bd 


Systems with two binary operati 


+ rings, integral domains, fields 25 


3. Let A be a ring. A 2 X 2 matrix with coefficients in A consists of four ele- 
ments a, 6, c, d of A written in a square array 


( b 

e 4, 

Let M be the set of all such 2 x 2 matrices. We define two binary operations 
in M as follows: 


C +e *) -( +a’ 6 re) 
ec d. ed’ ete d+d’ 

(: D) x ¢ " . (<* tbe! ab! + a) 

ec d, ed’ ca’ + de’ ch’ + dd’. 

(A brief inspection shows that in the second equation each of the four entries in 
the right-hand side is obtained by combining a certain row of the first matrix 
with a certain column of the second, in the manner indicated.) Prove that M 
with these two operations is a ring. What are the zero and unit elements of M? 
Is M an integral domain? (M is an example of a matrix algebra; such systems 
are very important in many applications and will be studied in some detail in 
later chapters of this book.) 

4. Let A be a ring, and let U be the subset of A consisting of all elements of 
A having inverses for the multiplication operation x. Prove: 

(a) If x, y are in U, so is zy. 
(b) U, with the binary operation x, is a group. 

5. Let k be a ring-homomorphism of a ring A to an integral domain A’. Assume 
that A is not identically zero [that is, there is at least one element a in A such 
that h(a) = 0]. Prove that k(e) = e’, where ¢’ is the unit element of A’. 

6. Prove in detail that a subring of a ring (see Definition 9.2) is itself a ring. 

7. Let h be a homomorphism from a ring A to a ring A’, and let B be the sub- 


set consisting of all elements of A that are sent by h into the zero element of A’. 
Prove that B is closed under + and x. 


oEFinition 25 A field K is an integral domain, containing more than one element, 
such that any element of K other than zero has an inverse with respect to multiplication. 

Thus if a is in K and a # 0, then there must be in K an element (we denote 
it as usual by a") such that aa-! = 1.f From Theorems 3.1 and 4.8 it follows 
that the unit element 1 of K is unique and the inverse a~? of any element a is 
unique. In particular, we have 1-! = 1 and (a7')-' = a for any element a ¥ 0 
and (ab)-! = a')— for any elements a, b # 0. We observe furthermore that it 
is not necessary to assume the cancellation law for multiplication in K because 
it is a necessary consequence of the existence of inverses by Theorem 4.5. From 
Theorem 9.4 we recall that a product ad in K cannot be zero unless a = Qorb = 0. 


+ We use the symbol 1 instead of ¢ for the unit element of K. 


26 Binary operations and groups Ch. 1, See. 9 


exampce ‘The rational-number system is a field (cf. See. 8, Chap. 8), The real- 
number system (ef, Chap. 4) and the complex-number system (ef. Chap. 5) 
are fields. For any prime number p, one can find fields with exactly p elements 
(cf. See. 10, Chap. 2). 

Recall that subtraction in an abelian group is defined by a — 6 = a + (—), 
where —b is the inverse of 6 for addition. In an entirely analogous way we can 
define division in a field K. Namely, if a and b are elements of K, with b = 0, 
then we put 


92 = abt @ #0) 


a 
5 
a 
ry 
operation is neither commutative nor associative in general.) Taking a or b to 


be 1 in (9.2) and recalling that 1 = 1, we get 


(Sometimes > is written a/b or a + b and is called the quotient of a by b. The 


a 1 @ 1 
oe fap gaaee 
a3 j 7? 3 b b ay 
From (9.2) we have further (ab-!)-6 = a(b~'b) =a and ce = e(ab“) = 
(ca)b—!, and we get the elementary rules 
a aca c b 
94 Ro =e ea aae zo 


THEOREM 9.5 Ifa and b are elements of a field K, with b ~ 0, then the element of K 
denoted by a/b is the unique solution of the equation bx = a. 
Proof. It is a solution of the equation, by (9.4), and it is unique by 
the cancellation law. 


THEOREM 3.6 Let a, b, a’, b’ be elements of a field K, with b and b' not zero. Then 
a/b = a'/b! if and only if ab’ = ab; and that is so if and only if there is an element 
e *O0in K such that a’ = ca and b’ = eb, 

Proof. By definition a/b = a’/b’ means ab-! = a’b'. Multiplying 
by bb’ and keeping in mind that multiplication in a field is commutative, 
we get ab’ = a’b. Conversely, if ab’ = a’b, then multiplying by 6-''—! 
yields a/b = a’ /b’ 

Now define c = b’/b. Then cb = 8’, by Eq. (9.3), and e #0. If 
ab’ = a’b, then multiplying by 5-' we get ab’6-' = a’. But b’b-' = ¢, 
and so we have a’ = ca. Conversely, if a’ = ca and b’ = ¢b, then a’b = 
cab and ab’ = cab, whence ab’ = a’b. @.E.D. 


THEeorem 9.7 Let a, b, c, d be elements of a field K, with b and d not zero. Then 


Systems with two binary operations: rings, integral domains, fields 27 


and if alsoc = 0, then 


ae ad a () =! 
a7 abe a) ~e 


Proof. First note that bd = 0 because b ~ OQ andd ~ 0 (Theorem 9.4), 
To prove the first equality we have, using (9.2), (9.3), (9.4), 


a c a € 
bi (E+ §) = beo§ + ba 5 
b d 
= dae? + be. = ad + be 
But also 
‘ad + be 
sd. ( Fi) = ad + be 


Therefore both sides of the first equation are solutions of 6d . x = ad + 
be. They are therefore equal, by Theorem 9.5. The other equations follow 
from similar arguments, and their proof is left as an exercise. 


DEFINITION 36 Let A be a subset of a field K. If A, equipped with the operations 
of K, is a ring, then A is called a subring of K; if a subring A happens to be a field, 
then it is called a subfield of K. 


EXERCISES 

8. Show that the nonzero elements of a field K, with multiplication as binary 
operation, form an abelian group. 

9. Show that the zero and unit elements of a subring A of a field K are the 
game as those of K. Show that if A is a subfield of K, then the inverse of a non- 
zero element in A (for either multiplication or addition) is the same as its inverse 
in K. Prove that a subring of a field is an integral domain. 

10, If a, b, ¢, d are elements of a field with a/b = ¢/d, prove the following: 


bod ab 
@o-3 Oo=a 
atb_e+d a—b_c-d 
@ opened @ ies 
atb_oc+d @_ate 
© Dob eta D5 eta 


assuming of course that the various denominators are not zero. 


Rings, integral domains, the integers 


1. Introduction 


The system of the natural numbers 1, 2, 8, ete., is unquestionably the most im- 
portant mathematical system. It is also the most familiar one, and the beginning 
student may wonder what there is to say about it that he does not already know. 
Yet that system has such an extraordinarily rich and complex structure that it is 
still the source of some of the deepest and most challenging problems of mathe- 
matics. It must therefore be counted a singular fact that the natural-number 
system can be described with the utmost precision and brevity by a few simple 
axioms, 

The simplest set of axioms for the natural-number system, and one of the first, 
was published in 1889 by the Italian mathematician and logician Giuseppe Peano. 
His axioms, five in number, are usually called Peano’s postulates. Unfortunately 
the road from those postulates to the study of the properties of numbers is a rather 
lengthy one, and for the sake of brevity we shall give a slightly longer list of axioms 
which will lead us to our goal more quickly. Our axioms will describe the system 
of all integers 0, +1, +2, etc., rather than the positive integers, i.e, natural num- 
bers. Several of these axioms will he very useful in describing other mathematical 
systems. 

In order to understand the aim of this chapter and of some of the later ones, 
the student should try to answer the following questions: Just how does the 
natural-number system differ from the rational-number system (whole numbers 
and fractions), and how do these systems differ in turn from the real- and complex- 
number systems? What features do the four systems have in common? Reason- 
able answers to these apparently simple questions are in fact not so easy to give; 
one of our tasks will be to supply some of the answers. 


2. Systems with two binary operations: 
rings and integral domains 
Whole numbers can be combined by means of the familiar operations of addition, 


subtraction, multiplication, and in certain cases, also by division. There appear 
to be four binary operations here, but we shall soon see that two of them (sub- 


Systems with two binary operations: rings and integral domains 29 


traction and division) are easily defined in terms of the other two (addition and 
multiplication), We wish to investigate the relation between the latter two 
operations, and in order to achieve a setting which will also serve us for the ra- 
tional-, real-, and complex-number systems, we shall begin by studying briefly 
abstract systems with two binary operations. In this and later chapters we shall 
encounter many important examples of rings, which we now define. 


oEinition 22 A set A is called a ringt if i is equipped with two binary operations 
+ and X such that the following axioms hold: 
(1) A, with the operation +, is an abelian group. 
(2) The operation X is associative. 
(8) A contains an identity element e for the operation x: 
eXa=axe=a — forany element aof A. 
(4) ax ( +e) = (@ Xd) 4 (a X ec) and (b +e) Xa = (6 Xa) + Xa) 
for any elements a, b, ¢ of A (distributive axiom). 
We shal] encounter many important examples of rings. 

According to Definition 6.1, Chap. 1, A must contain an identity element for 
the operation +. We will denote this identity element by 0; thus a +0 =a. 
And every element a has an inverse ~a with the characteristic property ¢ + 
(-a) = 0. Recall that a — > stands for a + (—5). We call a + 5 the sum of 
aand b, a — 6 their difference, and the operation + addition. 

For the present we do not require that the second operation x be commutative, 
although that assumption will soon be added. (If x happens to be commutative, 
then we call A a commutalive ring.) We shall call the operation x multiplication; 
the element a x 6 will be called the product of a and b, and to simplify the notation 
we shall generally write it as 2-6 or ab. Axiom (8) states that A contains an 
identity element ¢ for multiplication; ¢ is usually called the unit element of the ring. 
Axiom (4) supplies the only connection between the two binary operations + 
and Xx. 

Note that the introduction of the terms “addition” and “multiplication” for the 
operations of A is purely a matter of convenience. We do not mean to imply that 
the elements of A necessarily have any connection with numbers, or that the two 
operations + and x have any connection with addition and multiplication of 
numbers, except in so far as axioms (1) to (4) hold in both cases. 


sotation In order to minimize the use of cumbersome parentheses we shall adopt 
the following well-known convention: in any expression without parentheses in- 
volving combinations of elements of A by both + and x, we understand that 
the X operations are to be performed first, and then the + operations. For 
example, ab + ¢ stands for (ab) + ¢ [not a(b + )!]; abe — a'b’! + cd stands for 
(abe) — (a'b’) + (ed). Thus the distributive axiom can be written simply as 


+ In some texts axiom (3) is not included in the definition of a ring. The rings of the type 
we consider, with axiom (3) in force, are sometimes called rings with unit element. 


30 Rings, integral domains, the integers Ch, 2, See, 2 


ate + ¢) = ab + ac 
(b + e)a = ba + ca 


Since both + and x are associative, we do not need parentheses to indicate group- 
ing of terms, although we still need them of course if the order in which the opera- 
tions are to be performed is not covered by the above convention, as in a(b + ¢). 
The expression abe stands either for a(bc) or (ab)e, as explained in Chap. 1. We 
now prove some very useful theorems which are true for any ring A. 


THEOREM 2.1 The unit element of a ring A is unique. 
This has already been proved in Theorem 3.1, Chap. 1. 


THEOREM 2.2 2-0 = 0+a = 0 for any element a of A. 

Proof. For any element b in A we have b +0 =}, and therefore 
(640)-a=b+a, On the other hand, by axiom (4), (6 +0)-a = 
bea +0-a,andsob-a+0-a=b-a, Adding —(b-a) to both sides, 
we obtain 0-@ = 0, 

Similarly, a. (b + 0) =a-b +a-Oanda:(b +0) =a-b;s0a-b + 
a0 =a-b, Adding —(a-b) to both sides, we obtaina-@ = 0. (Natu- 
rally, if the operation x is commutative, the second part of thia proof is 
unnecessary.) 


THEOREM 23 (—a)-b = a-(—b) = —(a-b) for any elements a, b of A. 

Proof. We have a + (—a) = 0. Taking the product with 6 on the 
right of both sides of this equality, we obtain (¢ + (-a))-6 =0-b 
Using axiom (4) on the left-hand side and Theorem 2.2 on the right, we 
obtaina+6 + (—a)-6 = 0. Hence, by uniqueness of inverses, (—a) -b = 
—(a+b). 

The proof that a- (—b) = —(a-b) is quite similar and is left as an 
exercise. 


The results of Exercise 2, Theorem 2.3, and the rule —(—a) = a, obtained in 
Sec. 8, Chap. 1, embody all the familiar rules of signs used in algebraic computa- 
tions. These rules are thus seen to be necessary consequences of simple assump- 
tions about the binary operations. 

From Theorem 2.3 it follows that there is no ambiguity in writing —ab for 
—(ab), and we shall usually use that simpler notation. 


DEFINITION 2.2 A being a ring, lel B be a subset of A. Then B is called a subring 
of AT 
(i) B contains the unit element ¢ of A. 
(ii) For any two elements a, b of B the elements a +b, a — band ab are also in B. 
It is easy to verify that under these conditions the subset B, equipped with the 
two binary operations + and x in A, is itself a ring. 


Systems with two binary operations: rings and integral domains 31 


Just as with groups in Chap. 1, we may study mappings of one ring to another 
which are “compatible” with the twa operations. 


DEFINITION 23 A mapping f of one ring A to another ring A’ ts called a homomor- 
phism (or, more precisely, a ring-homomorphism) ?f, for any elements a and b of A, 
the following equations hold: 


fla + b) = fla) + 1) 
F(ab) = fla) - f) 


Let us compare this new notion of homomorphism to that already defined in 
Chap. 1. If we ignore the multiplication in the rings A and A’, we may consider 
them as abelian groups, The first equation above is then precisely the requirement 
that f be 2 homomorphism of A to A’ (considered as groups). Therefore, if 0 
denotes the zero element of A, as well as that of A’, we may conclude from Eqs. 
(8.12) and (8.14), Chap. 1, that 


£0) = 9 
2a f(-a) = —f(a) 
fa — b) = fa) — fe) 


Note that f does noi necessarily satisfy f(e) = e’, where e’ is the unit element of A’- 
Indeed the mapping defined by f(a) = 0 for every element @ of A is a ring-homo- 
morphism. A one-to-one homomorphism is called an isomorphism or, more pre- 
cisely, a ring-isomorphism. 


The following definition describes some special kinds of rings which will be our 
chief concern in this chapter. 


DEFINITION 24 A ring A is an integral domain if it satisfies the following axioms: 
(5) The x operation is commutative. 
(6) If a, b, care any elements of A, with ¢ 0, and if ac = be, then a = b. 
Axiom (6) is naturally called the cancellation axiom or law. 
Integral domains are clearly special cases of rings, since they are rings which 
satisfy certain additional conditions. 


THEOREM 2.4 A commutative ring A is an integral domain if and only if for any 
elemenis a, 6 of A we have ab * 0 unlessa = Oorb = 0. 

Proof. Suppose first that A is an integral domain, so that the cancella- 
tion law holds in A; and suppose that ab = 0, with saya #0. We can 
write our equation as ab = a - 9, and so from the cancellation law there 
follows b = 0. 

Conversely suppose that 2 product in A cannot be zero unless one of 
the two factors is zero. We show that the cancellation law holds. That 
is, if ac = be, then ac — be = 0, or (a — b)-¢ = 0. Ife # 0, then by our 
assumption the first factor a — bis zero. That is,a@ = 6. QED. 


32 Rings, integral domains, the integers Ch. 2, See. 3 
EXERCISES 
1. Prove that a ring contains only one unit element. 
2. Prove that if , 6, ¢, and d are elements of a ring A, then 
(a) (-a)-(—b) =a-b 
(b) (-e)-a =a+(-e) = -a@ 
(e) (—a@)+(—b)- (-e) = -(a- be) 
@ (a+): (+4) = ae + be +ad + bd 
3. Let A bearing. A 2 x 2 matrix with coefficients in A consists of four ele- 
ments a, 4, ¢, d of A written in a square array 


¢ b 

ec d, 

Let M be the set of all such 2 x 2 matrices. We define two binary operations in 
M as follows: 


CO+G Cpe aha) 


a B\ fa’ b\ _ faa’ + be ab! + bd" 
¢ A) x ¢ +) = Ce + de’ cb’ + dd’ 
(A brief inspection shows that in the second equation each of the four entries in 
the right-hand side is obtained by combining a certain row of the first matrix with 
a certain column of the second, in the manner indicated.) Prove that M with 
these two operations is a ring. What are the zero and unit elements of M?. Is M 
an integral domain? (M is an example of a matrix algebra; such systems are very 
important in many applications and will be studied in some detail in later chapters 
of this book.) 

4, Let A be aring, and let Ube the subset of A consisting of all elements of A 

having inverses for the multiplication operation x. Prove: 
(@) If 2, y are in U, 80 is zy. 
(6) U, with the binary operation x, is a group. 

5. Let # be a ring-homomorphism of a ring A to an integral domain A’. Assume 
that h is not identically zero fi.e., there is at least one element a in A such that 
h(a) = 0). Prove that h(e) = e', where e’ is the unit element of 4’. 

6. Prove in detail that a subring of a ring (see Definition 2.2) is itself a ring. 

7. Let h be a homomorphism from a ring A to a ring A’, and let B be the subset 
consisting of all elements of A that are sent by h into the zero element of 4’. 
Prove that B satisfies axioms (1), (2), (4), Definition 2.1, with the same operations 
as in A (B is called the kernel of h). 


3. Ordered integral domains 


The nonzero whole numbers can be split into two sets, one consisting of the 
positive numbers and the other consisting of the negative numbers. Moreover the 
numbers can be ordered according to magnitude; for example, 3 is smaller than 7 


Ordered integral domains 33 


(usually written 3 < 7). In this section we shall investigate that kind of situation 
in @ general framework which will also be of use in the study of the rational- and 
real-number systems. 
DEFINITION 3.1 An integral domain Z is called an ordered integral domain if its non- 
zero elements are split into two subseis J and J’ suck that the following conditions hold: 
(1) J’ consists of the inverses (with respect to the + operation) of the elements of J. 
(2) Ifa, b are any elements of J, then a + 6 ts also in J. 
(3) If a, b are any elements of J, then ab is in J. 

Observe first of all that if J’ consists of the inverses (for +) of the elements of J, 
as required by axiom (1), then reciprocally J consists of the inverses of elements 
of J’, since —(—x) = x. The requirement of axiom (1) is therefore not altered if 
we interchange the roles of J and J’. Since inverses are unique, it follows more- 
over that the mapping that sends x into —z establishes a one-to-one correspond- 
ence between J and J’. 

Similarly axiom (2) is really symmetrical in J and J’. For let a, b be elements 
of J’. Then, as just noted, -a and —6 are in J. Therefore (—a) + (-d) = 
—(a +5) is in J, by axiom (2), and so the inverse element a + b is in J’, by 
axiom (1). Hence axiom (2) holds if J is replaced by J’, The situation is quite 
different with axiom (8), however. We shall in fact show that the product of 
two elements of J’ is in J. 


DEFINITION 3.2 Z being as in Definition 3.1, the elements of J will be called positive, 
those of J' negative. If a, b are two elements of Z, then we say that a is smaller than 
b (or (hat b is greater than a) if the difference b — a is in J, and in that case we write 
a<borb>a. 
With this notation we can rewrite axioms (2) and (3) in the more familiar form 
Q) a> and b>0, then a+b>0 
3’) Ifa>od and b>, then ab>o 
for a > 0 means that a — 0 = a isin J, etc. (see Theorem 3.1). Observe that no 
meanings are to be attached to “positive,” “less than,” <, ete., other than those 
meanings just assigned to them. 
In the following theorems we assume that Z, J, and J’ satisfy the conditions of 
Definition 3.1. 


tHeorem 31 J consisis of all elements x such that x > 0, and J’ consists of all 
elements x such thal x <0. 
Proof. Sincez — 0 =x + (-0) =2 +0 =2,c is inJ if and only if 
x - OisinJ. Butz ~ OisinJ, by definition, if and only if x > 0. Hence 
J consists of all elements x such that x > 0. 
On the other hand, zis in J’ if and only if —xisin J. But — 
and so x is in J’ if and only if 0 — x isin J, that is, 0 > 2. 


0-2, 


corouary If x is in Z, then x > 0 if and only if —x <0, and x < 0 if and only 
if —-zr>0. 


84 Rings, integral domains, the integers Ch. 2, See. 8 


Proof. By Theorem 3.1, x > 0 if and only if xis in J, and —x < Oif 
and only if —z isin J’. But, by axiom (1), zis in J if and only if —z is 
in J’, The argument for the other case (x < 0) is similar. 

THEOREM 3.2 J and J’ have no elements in common. 

Proof. Suppose that the element « is both in J and in J’. Then, 
by axiom (1), z = —y for some yin J. By axiom (2), since x and y are in 
J,soisx+y, butz +y = -y+y=9. This is a contradiction for J 
does not contain 0. Hence the original assumption that x is both in J 
and in J’ is false, and the theorem is proved. 


theorem 3.3 If a,b are any elements of Z, then exactly one of the following relations 

must hold: a < bora > bora =}. 
Proof. Consider the element 6 —@. If b —a@ #0, then b — a is in 
J or J’, but cannot be in both, by Theorem 3.2. Hence the following 
eases exhaust all possibilities: b — a is in J, b — aigin J’, orb — a = 0. 
These cases correspond, respectively, to the cases in the statement of the 
theorem, for if 6 — a isin J’, then —(b — a) =a — bisin J anda > b. 

This theorem is often called the trickotomy condition. 


THEOREM 3.4 If a, b, c are elements of Z such that a <b and b <e, then a <e. 
Proof. Ia <b,b <e,thend —aande — dare in J. By axiom (2) 
so is their sum (6 — a) + (¢ —b) =e —a. Hencea <e, 
THEOREM 3.8 If a,b are elemenis of Z such that a <b, thena +¢ <6 +c for any 
element ¢ in Z. 
Proof. If a <>, then 6 —a is in J. However, b —a = (b +e) — 
(a +e), and so (b +e) —(a@+e)isinJ. Henceate <b +e. 


THEOREM 3.6 Jf a, b are elements of Z such that a < b, and if ¢ > 0, then ca < eb. 
Proof. Ifa < bande > 0, thenb —aandcareinJ. By axiom (38), 
e(b —a)isinJ. Bute(b — a) = eb — ca, andsocb — caisin J. Hence 

ea < eb. 


THEOREM 37 If a,b are elements of Z such that a < 0,6 < 0, then ab > 0. 
Proof. By the corollary of Theorem 3.1, —2 > Qand —b > 0. So, by 
axiom (3), (—a)(—b) > 0. But (—a)(—4) = ab by Exercise 2(a), Sec. 2; 
hence ab > 0. 


This theorem says that the product of two elements of J’ is in J, and so J and J’ 
are not interchangeable in axiom (8). 


qucorem 3s If a #0, then a-a >. Furthermore e > 0 provided Z contains 
more ihan one element. 

Proof. If a x 0, then either a > 0 or a < 0, by the trichotomy con- 

dition (Theorem 3.3). In either case, aa > 0, by axiom (2) or else by 

Theorem 3.7. If Z contains more than one element, then it certainly has 


Ordered integral domains 85 


an element a #0. Since ¢-a =a, it follows that e ~ 0, for 0-a = 0 
(Theorem 2.2). Hence by what has just been proved, e-e > 0, and since 
e+e = é, we havee > 0. 


Thus the splitting of the nonzero elements of Z into J and J’ satisfying the 
axioms of Definition 3.1 enables ug in a very simple way to define an order relation 
< among the elements of Z. And the preceding theorems show that < obeys the 
familiar rules that we learn for the relation “‘less than” among numbers. The great 
advantage of the axiomatic treatment is that it shows us very clearly just what is 
involved in < (namely, the axioms for an ordered integral domain) and it strips 
that relation of any vague and misleading philosophical connotations. Moreover 
we shall be able to apply our conclusions to other systems besides the system of 
integers (to be taken up in the next section). 


EXERCISES } 

All the exercises below refer to an ordered integral domain. 

1, The notation @ < 6 means that either 2 < > or a = 6; similarly for a > b. 
Show that Theorems 3.8 to 3.6 remain valid if <, > are replaced by <, >, re- 
spectively. 

2. Prove that a +6 > aifb > 0; prove thata +6 <aifb <0. 

3. Prove that a < 6 if and only ifa — 6 <0. Prove that a > b if and only if 
a-b>0. 

4. Prove the following: 

(a) Ifa <bande <d,thena +e <db+d. 

(o) Ifa <bande <d,thena +e <b+d. 

(c) Ifa < bande <d,thena +e <b 4d. 
5. Show that if ¢ <b, then ¢ —a@ >c — 6 for any c; show that —a¢ > —6. 
6. For any element a in Z define the symbol |a|, the absolute value of a, as follows: 


lal =aifa > 0; |a| = —aifa <0. Prove the following properties: 
(a) la] > 0 unless a = 0 
(0) |al = |—al 


©) -ial <a < [al 

(d) la + 4] < [al + |b} 

(e) ja +6 +e] < Jal + (Of + lel 

# fatb +e +44] < jaf + |b] + fe] + |dj 
(g) Ja} — |b] < |a +] 


[el — al] < ie +0 


(@ abl = Ja| - [b| 
[Relations of the types (c) to (h) are called inequalities. (@) is particularly impor- 
tant; it is sometimes called the triangle inequality.) 


+ You may use the theorems established above in working these exereises, rather than going 
back to the definitions. 


36 Rings, integral domains, the integers Ch. 2, See. 4 


7, Show that. 
(a) Ifa < bande < 0, then ac > be. 
(b) Ifa < 0,6 > 0, then ad <0. 
(ce) Wa <0, thena-a-a <0; andifa > 0, thena-a-a>0. 
(d) Hf ae < be and > 0, thena <b. 
(e) 0 <a <band0 <c¢ <4, thenac < bd. 
(f) Ifa > 0 and ab > 0, then b > 0. 
(g) Ifa > O and ab < 0, then d < 0, 

8, Prove that if a > e and b > e, then ab > e, Prove that if0 <a <¢,0< 
6 <e, then ab <e. 

9, Let a be a positive element of Z which has an inverse a~! for the x operation. 
Show thata-! > 0, Prove that if a > e, thena-! <e;andifa <e,thena'>e. 

+10. For any a,b in Z prove that 
a-a+b-b > abl + [ad] 
[Hint: Consider the product (a — )- (a — 5).] 

11. Show that axiom (6) of Definition 2.4 is superfluous in the definition of an 
ordered integral domain. That is, show that the cancellation law follows from the 
other axioms, [Hint: Apply the trichotomy condition to a — b.] 

12. Let Z be an integral domain and suppose there is defined a relation < among 
the elements of Z satisfying the assertions of Theorems 3.8 to 3.6. Show that the 
nonzero elements of Z can then be split into two subsets J and J’ satisfying axioms 
(1) to (8) of Definition 3.1. 

*13. Given positive elements « and y in an ordered integral domain such that 
e, show that x+y 2e +e. Furthermore, x+y =e +¢ if and only if 
=e. 


4. The system of integers 


There are many mathematical systems which are ordered integral domains. Of 
greatest importance are the system of integers and the rational- and real-number 
systems. In this section we show that the system of integers can be distinguished 
from all other ordered integral domains by the addition of one new axiom. For 
convenience of reference we restate all the axioms here, and in the rest of this 
chapter we shall follow the new numbering given below. We also make some slight 
notational changes at this point: For the unit element e of our system we now 
write 1, and boldface letters Z, J, J’ will be used to prevent confusion with the nota~ 
tion of earlier sections. 
AXIOMS FOR THE INTEGERS Z is a sef of elements equipped with two binary 
operations + and X satisfying the following conditions: 

(1) Z with the + operation is an abelian group. 

(2) The operation X is associative and commutative. 

(8) Z contains an identity element 1, different from 0, for the X operation. 


The systemt of integers 37 


(4) The operations + and X satisfy the distributive law 
a+(b +c) = ab + ae for any elements a, b, ¢ in Z. 

(5) The nonzero elements of Z are divided into two subsets J and J’ such that I 
consists of the inverses (with respect lo +) of the elements of J. 

(6) Ifa and b are in J, then so area + b and ab, 

(1) If U is a subset of I such that 1 is in U and such that x + Lis in U whenever 
xis in U, then U = J. 


As we have indicated above, there is essentially just one system which satisfies 
these axioms, and we shall take that system to be the system of integers, or whole 
numbers. We must of course give some kind of justification for doing so. That is, 
we must show that the system determined by axioms (1) to (7) somehow corre- 
sponds to our intuitive idea of the integers. Now (using the results of Sec. 3) the 
first six axioms simply say that our system has two operations (+ and X) and an 
order relation (<)} which obey all the rules that we were obliged to memorize in 
elementary school. [The condition 1 = 0 in axiom (3) merely means that Z must 
contain more than the single element 0.] Axiom (7) looks little more complicated, 
but a brief examination of it discloses that the axiom merely expresses the intuitive 
idea that no part of J other than J itself can contain all the elements 1, 1 + 1, 
1 +1 +1, ete. In other words, J consists precisely of all the elements 1, 1 + 1, 
1+1+41,ete. That is clearly what we have in mind when we think of the posi- 
tive integers. But we cannot state axiom (7) in that intuitively suggestive fashion 
because we would first have to give a precise meaning to “‘ete.” (or some equivalent. 
of it), and in the long run that would be more complicated than axiom (7). We 
shall return to this matter in See. 5. 

It should be mentioned that the system of integers is an integral domain. The 
only axiom for an integral domain that is not explicitly contained in the axioms 
for the integers is the cancellation law, which we now prove. Accordingly, sup- 
pose that ae = be with e #0. We want to prove that a = 6. Ifa = 5, then by 
axiom (5) either a —b > 0 or 6-a> 0. For definiteness, say @ — 6 > 0. 
Similarly either ¢ > 0 or —c > 0, and for definiteness say ¢ > 0. Then by axiom 
(6), @—b)-e >0. But (a —b)+¢ = ac — &¢ = 0—a contradiction. Thus 
a = 6, and the same conclusion results if —¢ > 0. 

In this section we establish some basic properties of Z and we state precisely 
what we mean when we say that the axioms above determine Z in an essentially 
unique way (Theorem 4.4). The proof (tedious but not really very difficult) is 
deferred until See. 11. 

We recall first that the unit element 1 must be positive; that is, 1 must be in J 
(Theorem 3.8). 


TneoREM 41 If x is in J, then either x = 1 or x > 1. In other words, 1 is the 
smallest positive element of Z. 
t We continue with the definition of < laid down in See. 3. 


38 Rings, integral domains, the integers Ch. 2, See. 4 


Proof. We define a subset U of J as follows: U consists of 1 itself and 
of all the elements of J which are greater than 1. Our proof will consist 
in showing that U = J, and to achieve this, we will have to use axiom (7). 

Let z be an element of U. Theneitherx = lorx > 1. We must show 
that x +1 is an element of U, If x =1, then x +1=1+41 and 
1+1>1 (why?), On the other hand, if 2 > 1, thene +1>21+41, 
by Theorem 3.4, and so z-+1>1, by Theorem 3.3. In either case 
2 +1 > 1, and therefore x +1 is in U. Hence, by axiom (7), U = 


tHeorem 4.2 If x <y,thenz +1 <4. Inother words, there can be no element of 
Z between x and x +1. 
Proof. Itz <y, then y —zisin J. By Theorem 4.1, y — x > 1 and 
so, by Theorem 3.5,y > 2 +1. 


THEOREM 4.3 Lei T be any subset of J containing at least one element, Then T 
contains a smallest element. That is, there is in T a unique element y such that y < z 
for any other element z in T. 
Proof. If T contains 1, then 1 is clearly the smallest element in 7, 
by Theorem 4.1, Suppose then that 1 ig notin 7. Let U be the subset of 
J consisting of all elements of J that are smaller than every element of T. 
Since 1 is not in 7, 1 hag that property and so 1 is in U. 
Now let « be an element in U and consider + 1. Let y be any element 
of T. By the definition of U,2 < y and therefore x +1 < y, by Theorem 
4.2, Suppose that z +1 = y. We claim that then y must be the smallest 
element of T. Indeed, let z be any other element in 7. Then by the above 
argument we have similarly z +1 <2. But we cannot havex +1 =z 
for otherwise y = z. Hence x +1 < z, showing that y = x + 1 is the 
smallest element in T. Suppose on the other hand that th: following case 
occurs: for no ain U do wehavez +1 =y. Thenx +1 < y, and there- 
fore z + 1is in U whenever zis in U, Hence, by axiom (7), U = J. But 
that is impossible unless 7’ contains no elements at all. We have therefore 
arrived at a contradiction, showing that the last case considered is im- 
possible. But in every other case T has a smallest element. Q.E.D. 


The assertion of Theorem 4.3 is usually expressed by saying that J is well-ordered, 
meaning simply that every non-empty subsei of J has a least element. 

For some purposes it is convenient to introduce the term segment. Let « be an 
element of J. By the segment [1, a] we mean the subset of J consisting of all 
elements x of J such that z <a. By Theorem 4.1, 1 < a, and so 1 is in the segment 
{1, a]. Naturally @ < a, and so ais also in the segment (1, a]. The segment [2, 2] 
consists of t alone. The segment (1, 1 + 1) consists of the elements 1,1 + 1 by 
Theorem 4.2. 

We have already mentioned that axioms (1) to (7) determine the system of 


Some comments 39 


integers Z in an essentially unique way. A precise statement of this uniqueness 
theorem follows. 


Theorem 44 Let Z and 2 be two systems satisfying axioms (1) to (7). Then there is 
one and only one ring-isomorphism of Z to Z. This isomorphism makes the zero and 
unit elements of Z and Z correspond, and furthermore it preserves order. 

The proof will be given in See. 11. 


EXERCISES 

1. Let 5 be an element of Z, and let Z be the set of all elements x of Z such that 
x >>. Prove that # is well-ordered. 

2. Prove that if zy = 1, then = y =lorz =y = —-1, 

3. Prove that the segment (1, 1 + 1 + 1] consists of 1,1 +1and1 +141. 

4. Prove that, given two segments [1, a] and (1, bj, one is wholly contained in the 
other. 

5. Prove that a + 1 is the only element contained in [1, a + 1] but not in [1, al. 

6. A non-empty subset T of Z is bounded from above if there is an element z such 
that x < for any element x of Z. Prove that, if 7 is bounded from above, it 
contains a greatest element y, that is y > x for any element x of Z. Compare this 
result with Theorem 4.8. 


5. Some comments 


Theorem 4.4 states that any two systems satisfying axioms (1) to (7) have precisely 
the same structure; they are completely interchangeable. We henceforth suppose 
that one such system is fixed once and for all. We call it the system of integers, and 
we shall always denote it by Z. The elements of Z will of course be called integers. 
The set of positive elements in Z will be denoted by J, and its elements will also 
be called natural numbers. For elements of J we introduce the following notation: 


14+1=2 THl+14+14+1=5 
14+141=3 L+i+14+141+1=6 
T+1¢+14+1=4 L+l+14+14+14+14147 


and so on in the familiar fashion. As explained earlier, axiom (7) states in effect 
that J consists of all the elements 1, 2, 3, ete. 

We have stated that there cannot be two essentially different systems satisfying 
axioms (1) to (7), but we have not shown that there can exist even one such system. 
That is, we have not shown that our axiom set is free from internal inconsistencies. 
It is possible to do so, but we shall not enter into that matter here. 

The reader may protest that Z is after all just the system of integers he studied 
in elementary school, and therefore the system must obviously exist! Are not 
integers—at least positive integers—essentially objects of Nature? And what is 
the point of going through all the axioms anyway? 


40 Rings, integral domains, the integers Ch. 2, See. 5 


Let us examine these rhethorical queries for a moment. It is certainly true that 
Z is just the system the student has studied in elementary school. And he may do 
well therefore to ask himself just what he learned there about the integers. Prob- 
ably he learned to make numerical caleulations, but very little beyond that. And 
the mere fact that one has studied something in elementary school is certainly no 
real guarantee that it is free from contradictions. Now what about the relation 
of integers to Nature? If you but look around you will find many instances of, say, 
the number 2--a pair of shoes, a pair of dice, etc. But you will not find the number 
2, or any other number for that matter. The integer 2 and all the other integers are 
concepts, or constructions of the human intelligence. It may be worth noting that 
in some primitive societies there are no integers greater than 2, The integer 5 
simply does not exist for the Hottentots, and there is no word for it in their lan- 
guage. 

The numbers 2 and 5, ete., are abstract concepts and by that fact are rather far 
removed from Nature. It is of course true that they have counterparts in Nature; 
we can easily find many concrete instances of 2 or 5 about us, and we can even 
make experimental verifications of the rules of arithmetic for small numbers. 

But the situation is rather different for very large numbers. For example, the 
integer 10° has no known counterpart whatever in Nature. Indeed that number 
is in all probability enormously larger than the total number of electrons, protons, 
and all other particles in the entire universe, It is not possible, even in principle— 
much less in practice—te find a concrete instance of 10%. But that fact in no way 
hinders us from thinking about 10”, from forming a concept of it, even from cal- 
culating with it. It would clearly be an extremely unsatisfactory solution to banish 
all very large numbers simply because they have no concrete realizations. 

Integers are abstract concepts, and only relatively small integers have anything 
to do with Nature.t Therefore we cannot appeal to Nature for evidence of the 
existence of the abstract system of integers, nor can we infer any properties of the 
system from that source. Where then do we turn to find out about the system? 
As we have tried to make plain, the integers are, in an essential way, a product of 
the human mind, with all its frailties and inconsistencies. Can we then be sure 
that Tom, Dick, and Harry all have the same concept of the integers, the same 
understanding of that system? In all likelihood T, D, and H have only the haziest 
notion of the integers, derived chiefly from limited experience with elementary 
numerical computations. T, D, and H cannot help us much. 

The easiest and most reasonable solution to this problem is obviously just to say 
what the system of integers is—to list its properties in a clear and unambiguous 
way. The attentive reader will be aware that that is precisely what we have done 
in axioms (1) to (7). We have listed the properties of the system Z, and we prove 
in Sec. 11 that its structure is thereby completely determined. Furthermore we 
+ That does not mean however that large numbers are useless in the study of Nature. 


Some comments Al 


have been at pains to point out that our axioms reflect as closely as possible all the 
familiar properties that we expect the integers to have. 

But there is an important point to be noted. For us, integers are elements of a 
set with certain operations --any elements at all. The individual natures of the 
elements of Z are of no interest to us whatever and have nothing at all to do with 
the strueture of Z. That structure reposes ultimately in the operations + and X, 
not in the peculiarities of the constituent elements. We have thus been able to 
divest integers of any philosophical connotations and any dependence on Nature. 
So to speak, we do not care what integers ‘“‘are,” in any philosophical sense, if 
indeed that implicit question has any meaning. The axiomatic method has there- 
fore allowed us to ignore many very difficult, tiresome philosophical inquiries. 
That is a feature which is characteristic of the mathematical method. That method 
seeks to study only the structures of systems, not their ultimate constituents. 

Before going on to prove some more theorems about integers, we shall make 
brief mention of Peano’s postulates (they are axioms for the positive integers J, 
not for Z). They are listed below, but we have combined two of Peano’s postulates 
into one, so that only four appear instead of the five mentioned in Sec. 1. In these 
postulates, J stands for a set of elements. 


I. To each element x of J there is assigned some element S(z) of 4 (called the 
“successor” of x). 
IL. If S(x) = S(y) for two elements x, y in J, then x = y. 
IIL. There is an element in J, denoted by 1, which is not the successor of any 
element of J. 
IV. Ef U isa part of Scontaining 1 such that for any element x in U the successor 
S(z) is also in U, then necessarily U = J. 


These axioms are noteworthy for their simplicity, and in that respect they are 
logically more satisfactory than our axioms (1) to (7). It should be noted that IV 
looks very much like our axiom (7). Observe that Peano’s postulates make no 
mention of addition or multiplication. 

In order to show that these axioms determine a system with the properties that 
we expect of the positive integers, one proceeds as follows: One first shows that 
there can be essentially only one system satisfying the axioms. The proof is quite 
similar to our proof of Theorem 4.4 but is actually rather simpler because fewer 
things are involved. Next one shows that it is possible to construct in J a unique 
binary operation + which is associative and commutative and which satisfies the 
equation S(z) = x + 1 for all z in J. Then one shows in a similar way that it is 
possible to construct another unique binary operation X in d such that 1-2 = x 
and x- (y + 2) = xy + xz for any elements of J. Finally, one constructs from J 
a larger system Z containing J and satisfying our axioms (1) to (7). 


4g Rings, integral domains, the integers Ch, 2, Sec. 6 
EXERCISE 
Prove that 2-2 = 4 and 2-3 = 6. 


6. Finite and countable sets 


We shall say that a set E is finite if its elements can be put into one-to-one corre- 
spondence with some segment [1, ] in J, x being of course a positive integer (seg- 
ments are defined at the end of See. 4). We say then that Z has x elements. The 
one-to-one correspondence between [1, n] and E being regarded as fixed, we can 
then think of any element « in E as “labeled” by the corresponding integer in the 
segment—say j. We can write x; for x, and in this way we can indicate all the 
elements of E by the indexed symbols a, 2%, . . . , an, and we may speak of x; as 
the jth element of Z. This provides a very convenient and compact notation for a 
finite set, and it will often be used. 

A set which is not finite is naturally called infinite. Among the infinite sets are 
those which are said to be countable, or denumerable.} A countable set E is one 
whose elements can be put into one-to-one correspondence with the elements of J. 
(In particular, J itself is countable!) Some one-to-one correspondence between 
J and E being fixed, we can indicate the element of E corresponding to the positive 
integer j by some such symbol as x; (or whatever other letter seems appropriate), 
and we can then indicate the elements of E by writing mi, x2, 2, ... + 

Now let X be any set, not necessarily countable. Suppose that we are given a 
fixed mapping f from the segment [1, n] to X. Just as above, we can designate the 
element f(j) of X corresponding to the integer j of the segment by some such 


symbol as z, The elements f(1), {(2), .. . . f(n) of X, which we can write as 
2, Xa, .. . , Zp, are said to be an ordered n-tuple in X, often written simply as 
(xy ey... a). The n-tuple is also called a finite sequence in X. Observe that f 


here is not required to be one-to-one, and so the n-tuple may very well contain 
repeated elements. 
Similarly, if f is a mapping of J to the set X, then the elements f(1), f(2), f(3), 
. in X can be indicated (say) by m, 2, %, ... , and we then speak of an 
infinite sequence of elements in X; x, = f() is called the ‘wth term” of the se- 
quence, Again f need not be one-to-one, and so elements of X may be repeated in 
an infinite sequence. 


exampte The integers 1-2, 2-8, 3-4, etc., form an infinite sequence whose 
nth term is n+ (n 41). The integers 1,14 3,14+3+5,1+8+547, etc, 
form an infinite sequence whose nth term is the sum of the first » odd integers 
(and is equal to n°; see Exercise 2, Sec. 7). 


+The real numbers constitute an infinite set which is not countable. See Theorem 4.2, 
Corollary, Chap. 4. 


Mathematical induction and some of its applications 43 


EXERCISES 
1. Let E and E” be finite sets with no elements in common and containing, re- 
spectively, m and x elements. Let E” be their union, that is, the set consisting of 
all elements in either F or BE’. Show that E” contains m + n elements, 
2. Let a, b be elements of Z with b > a, Let E consist of all integers x such 
that a <x <b. Show that E contains b — a elements. 

*3, Let E be a countable set and H’ be a finite or a countable set. Show that 
the union of E and E” is a countable set. 

*4. Let E be a countable set. Prove that any non-empty subset of # is either 
finite or countable. 

*5. Let Ei, Ey, ... bea finite or countable collection of sets, each of which is 
countable. Show that their union (set of all the elements in all the £,) is countable. 

6. If A isan infinite set, B a finite set, and f a mapping of A to B, show that there 
exist distinct elements a, a’ in A such that f(a) = f(a’). 

7. Show that any finite set of integers contains a smallest element and a greatest 
element, both unique. 

*8, Show that the number of elements in a finite set H is uniquely defined; 
show that if there is a one-to-one mapping of a segment (1, ] to EH, then that ig 
the only segment with a one-to-one mapping to E. 

*9. Show that a non-empty subset of a finite set is itself finite, 


Oey 


7. Mathematical induction and some of its applications 


7A. Axiom (7) of Sec. 4 ig called the axiom of complete induction (also 
finite or mathematical induction). It is the basis of a very important method of 
proof which we now describe—proof by mathematical induction. Instead of 
using axiom (7) directly, however, we shall use Theorem 4.3. [It is an easy matter 
to see that axiom (7) and Theorem 4.3 are really equivalent, in the sense that we 
could have used the statement of Theorem 4.3 in place of axiom (7), then deducing 
the assertion of axiom (7) as a theorem.) 


w Let there be assigned to every positive integer n a statement P,, which may be 
either true or false. Suppose that P, is true and suppose that for each n the 
statement P,4 is necessarily true whenever the preceding statements P,, 
Pi, ..., Py are true, Then ail the P, are true. 


For let F be the set of all positive integers x for which P,, is false. By Theorem 
4.3 the set F has a least element 7 if it is not empty, and in that case m > 1 be- 
cause P, is true by assumption. Then n = mo — 1 is a positive integer, and by 
definition of F, all the statements P;, Px, . . . , Px must be true. Therefore, by 
assumption, Pay = Py, is also true—a contradiction. Hence F must be empty. 


4h Rings, integral domains, the integers Ch. 2, Sec. 7 


REMARK. Observe that there is nothing to prevent some of the P’s from being 
true, irrespective of the truth or falsity of the preceding statements. Several ex- 
amples of induction are given below. 

We point out explicitly that (I) above does not have the status of a theorem 
because there are some rather vague things involved (e.g., what is meant by 
“‘statement”). Hence (I) should be regarded as a kind of “model” for induction 
proofs. Formulating the general principle of induction as we have done in (I) 
simply saves us a lot of needless repetition of a frequently used method of proof. 
In applications of (1), the “statements” P, always have precise mathematical 
definitions, and when specific and exact mathematical statements are substituted 
for the P, in (I) and the “proof” we have given for it, the result is of course a 
precise mathematical theorem. Frequently the actual procedure of writing out 
all the details of an induction proof is rather tiresome, and mathematicians often 
abbreviate the whole thing by simply writing “and go on,” or some similar ex- 
pression. 


It sometimes happens that instead of having statements P, for every positive 
integer n we have them for only a finite number of integers, say those in the seg- 
ment [1, r] consisting of all integers n such that 1 < » <r. Mathematical induc- 
tion can still be applied: 


For each integer n in the segment (1, r] let P, be a statement which may be 
either true or false. Suppose that P, is true and that P,4, is necessarily true 
(for n <r) whenever the preceding statements Py, Po, .. . , Px are true, 
Then all the P, are true, forl <n <r, 


The argument is exactly the same as that given for (I). 


2B. Before going on to give examples of mathematical induction proofs, 
we describe another equally important application of Theorem 4.3—definition by 
induction. Here the problem is one of defining certain mappings from the positive 
integers J to some set E. 
Suppose that we wish to define a mapping f from J to a set E satisfying the follow- 
ing conditions: 


Ene S(1) ts a given element x of E. 

72 For each positive integer n there is given some rule, call it Ra, from which 
the element f(n + 1) in B is uniquely determined by the elements {(1), f(2), 
se Sm) 


For this situation we have a general ‘mapping principle” analogous to (1) above: 


am There is one and only one mapping f from J to B which satisfies the conditions 
(7.1) and (7.2). 


Mathematical induction and some of its applications 45 


The idea behind this principle is almost obvious: (7.1) tells us what f(1) is to be; 
then the rule R, tells how to determine f(1 + 1) = f(2) from f(1); the rule R; then 
tells how f(2 + 1) = f(8) is to be determined from f(1) and f(2); knowing f(1), 
(2), (8), we can then compute f(4) by means of the rule Ry, and so on. The 
mapping f is said to be defined ‘nductively by the given conditions (7.1) and (7.2). 
A precise demonstration of (III) involves some minor logical subtleties; we defer 
the demonstration until paragraph J below. 

We shall give some examples of (III) presently. We first state a slightly modified 
version, analogous to (II) above: Suppose that we want to define a mapping f 
from some segment [1, r] (rather than all of J) to a set E in such a way that the 
following conditions are satisfied: 


73 f(L) ts a given element x of E. 
1 For each positive integer n <r there is given some rule R, from which the 
element f(n +1) in E is uniquely determined by the elements f(1), f(2), 
- fn). 


The mapping principle for this case is 


avy Phere is one and only one mapping f from the segment (1, r] fo E which 
satisfies (7.8) and (7.4). 


Clearly the idea here is essentially the same as for (ITT) above. 

In reference to the rules 2, mentioned above, we point out that F, might flatly 
state what f(n + 1) is to be, without any reference at all to f(1), . - . , f(n), or Ry 
might involve only some of the latter elements, These possibilities are illustrated 
in the examples below. 

Just as with (I) and (II), the principles (IIT) and (IV) are not exactly theorems, 
because they involve the rather imprecise notion of a “rule” R,. We could attempt 
to give a rigorous definition of that notion, but there is no great point to doing so. 
For, in specific applications of (III) or (IV), the rule R, is given by a precise 
mathematical definition, and when that is done (III) and (IV) and their demon- 
stration in paragraph J become precise mathematical statements. 


7C. Here we give two simple examples of definition by induction, For 
the first example let it be required to find a mapping f from J to J satisfying the 
conditions 


fay =1 
fm+l=@4t) fn) forn>1 


The rule R, determining f(n + 1) is particularly simple; it involves only f(n). The 
mapping principle (III) declares that there is one and only one mapping f with 
these properties, and to write down a precise proof of that one has only to copy 
out the argument for (III) in paragraph J, putting in the mathematical condition 


46 Rings, integral domains, the integers Ch. 2, See. 7 


(7.5) for the unspecified rule R, in (7.2). The mapping f defined inductively by 
(7.5) is of considerable importance. The integer f(n) is usually indicated by n! 
(read » factorial). From (7.5) one really sees that the first few values are 1! = 1, 
2! = 2, 3! = 3-21 = 6,4! = 4.8! = 24, ete. It is customary to define 0! = 1. 

Our second example is of no particular importance; it is merely intended as an 
illustration. Let it be required to define a mapping F from J to J satisfying the 
following conditions: 


FQ) =1 F@)=2 and 
78 F(a +1) =(n+1)-F(@) for x > 1, where q is the greatest even 
integer not exceeding » 


Here the rule R, determining F(2) is simply F(2) = 2. For 2 > 1 the rule Ry 
involves only F(x) if » is even and only F(n — 1) if nis odd. The mapping prin- 
ciple (III) asserts that there is a unique mapping F satisfying these conditions. 
The first few values of F are as follows: F(3) = 3+ F(@) = 6, F(4) = 4-F(2) = 8, 
F(5) = 5+ F(4) = 40, FG) = 6 F(4) = 48, ete. 


7D. We now come to some very important applications of definition by 
induction, and in the course of working them out we shall also have some examples 
of proof by induction. 
Let S be a set with an associative binary operation +. Let x be a fixed but arbi- 
trary element of S. We wish to define a mapping, call it g, from J to S by the con- 
ditions 


91) = 2 
gv +1) =gin)*x forn >1 


From (III) we know that there is a unique mapping g satisfying these require- 
ments, and we have g(2) = g(l)*#z =x» x, g(8) = 9(2)«2 = 202%, (4) = 
(3) »z =2¥z+x« z, ete. (we omit parentheses because » is supposed to be asso- 
ciative). In E and F below we shall develop some important special notation for 
the mapping g. First we prove the following basic fact: 


18 g(m +n) = g(m)*g(n) for any positive integers m, n 


Proof. We prove this by induction, assuming that m is fixed through- 
out the discussion. Then let P, be the statement (7.8). P, is true, by 
(7.7) with m in place of n, We now show that P,4, is true whenever P, is 
true. We have g(m +2 +1) = o(m+n)*2, by (7.7). If Pa is true, 
then g(m +) = g(m)* g(n), whence g(m + n+ 1) = g(m)* g(n) «a, 
by the preceding equation. Now g(n)* 2 = g(n +1), by (7.7) again, 
and so we obtain g(m +” +1) = g(m) * g(n + 1), which is precisely the 
statement P,4;. It follows from the principle (I) of mathematical induc- 


Mathematical induction and some of its applications ay 


tion that all the P, are true. Hence (7.8) holds for all positive integers n. 
Since m was arbitrary, it also holds for all positive integers m. Q.E.D. 


From (7.8) we note (interchanging m and ) that 


78 atm) « g(n) = gn) * g(m) = g(m +n) — for any positive integers m, n 


7E, We now put our results in more familiar notation. If the binary 
operation in $ is written in the usual product notation, xy instead of z « y, then the 
element g(x) of (7.7) is denoted by x". This symbol is therefore defined for every 
element x in S and for every positive integer n, The Eqs. (7.7) can be written 
gag 


710 
gt = eg 


Equation (7.1) becomes 
Mu ate" = xz" =a" for all positive integers m, 


If the system § contains an identity element e for the binary operation, then it is 
customary to define 


712 xv? =e forany xin S 


The rules (7.11) hold for all m, » > 0. 
Further, if x has an inverse, call it 2—', for the binary operation under considera- 
tion, then we define 


zas a? = (x) (n > 9) 


Thus x” is defined for all integers n, positive, negative, or zero. It is easy to verify 
that (7.11) remains valid for all integers m, 7”. 
Another important rule for exponents is the following: 


maa (a*)" = 2™ for all positive integers m, 2 


Proof. We use induction on m, the integer # being arbitrary but fixed. 
Let P,, be the statement (7.14). Then P, is true, since y' = y for any y 
in S, in particular for y = x", Suppose that P,, is true for some m. We 
show that P41 must also be true. We have 


(ey! = Gyn) 


from (7.10) with x" in place of z. By assumption, (x")" = 2”, and so the 
right-hand side above is equal to x". x”, and this is equal to 2™t" = 
xz'm+), which is precisely the statement P,.4:. Therefore P+ is true if 
P, is true. Since P, is true, as noted above, all the P,, are true, by (I). 
That is, (7.14) holds for all positive integers m, and also for all positive 
integers n, since » was arbitrary in the proof. Q.E.D. 


48 Rings, integral domains, the integers Ch. 2, Sec. 7 


It is clear that (7.14) holds if m = 0 or x = 0, assuming that S contains an 
identity element ¢ for the operation. Using (7.13) one easily verifies that (7.14) 
holds for all integers m, n if x has an inverse, 

As a further property of exponents we have 


TAs (xy)? = ary" for all positive integers x, provided xy = yz 
In order to prove this, we show first that 
746 ay" = y"x for all positive integers n, if zy = yx 


Proof. The statement holds for x = 1, by assumption. If (7.16) holds 
for some n, then it holds for x +1. Namely, we have y"*! = y-y", by 
(7.11), and so xy"t! = (xy)y" = (yx)y" = y(ay"), since zy = yz. Since 
zy" = y"x, by assumption, there follows zy"t! = y(y*x) = (yy) = y"*2, 
which is (7.16) with » + 1 in place of x. Hence (7.16) holds for all positive 
integers x, by mathematical induction. Q.B.D. 

Proof of (7.15). The equation holds for » = 1, by assumption, Sup- 
pose now that it holds for some integer x. We show that it must also hold 
for n +1. To do so we have (zy)?! = (xy)"(zy), by (7.10). By assump- 
tion, (xy)* = c"y". Therefore, (xy)"*! = x"y"zy. By (7.16) this is equal 
to x*zy"y = x"*1y"*, This shows that (7.15) holds for x + 1 if it holds 
for n. Therefore (since it holds for n = 1), (7.15) is true for all positive 
integers x, by mathematical induction. Q.B.D. 


Obviously (7,15) and (7.16) hold for n = 0, Equation (7.16) is easily seen to 
hold for negative ~ if y has an inverse; (7.15) holds also for negative n if both x 
and y have inverses (assuming xy = yx). For example, we have yxy = yW'yx, 
whence y'zy = x. Therefore, y~!zyy— = xy, or finally yx = zy Hence 
(7.16) holds with y! in place of y. Similarly, one shows that e7y7 = yUx7, 
and therefore (7,15) is valid with x in place of z and y~' in place of y. 


7F, A somewhat different notational convention is used in the case of 
additive notation, Consider then a set S equipped with an associative binary 
operation +. We shall assume that + is commutative, since that symbol is 
rarely used otherwise. In this situation, the element g(x) of (7.7) (with + in place 
of *) is denoted by nx, or n+ 2, instead of x". Equation (7.7) then reads 


l-er=z 
(n+ le snc tz 


If S contains an identity element for +, as we shall assume for simplicity, de- 
noting it by 0, then we define 


O-r=0 


Mathematical induction and some of its applications 49 


the zero on the left being the integer zero, the zero on the right being the 0 in S. 
The equation is none other than (7.12) in the additive notation. Equation (7.11) 
tranacribed into the new notation is 


71a me + nx = (m+ nx 


this holding for all m, x > 0. If 2 has an inverse for the operation +, denote it 
by —x, then we define (—n)x by 


(-ajz = n(-2) (x a positive integer) 


This equation is (7,13) in the additive notation. Equation (7.18) then holds for 
all integers m, x. Equation (7.14) becomes 


7.19 minx) = (mnjx for all integers m,n > 0 


and this is valid for all integers if z has an inverse. 
Since + is assumed to be commutative, Eq. (7.15) in the additive notation is 


7.20 n(z + y) = nz + ny for alln > 0 
It holds for n < 0 if and y have inverses for the operation +. 


As a particularly important application, suppose that 8 is a ring, hence is 
equipped with both a product and a sum operation. If z is any element of S, 
then x" is defined for any integer n > 0 by (7.10) [or (7.12) for » = 0], and all 
the rules in paragraph E can be applied. Using the + operation in § we can also 
define nx for any positive integer n, by (7.17). Since S is an abelian group with 
the operation +, the element nz is defined for all integers n, by the foregoing 
conventions, and all the rules of this paragraph hold. We mention the following 
rules, valid in any ring: 


TA (nx)(my) = nm(xy) for any integers n, m 
1.22 (nx)" = nz" for any integer n and any integer m > 0 
Their proofs are left as exercises in mathematical induction. 

2G. Paragraphs 7D, 7E, and 7F were concerned with the notation of 
applying a binary operation repeatedly to a single element x of S. Here we con- 
sider briefly the similar idea of applying the binary operation repeatedly to pos- 


sibly different elements. We begin by considering an associative binary operation 
in a set S, and we use first the multiplicative notation. 


Let 21, % ..., 2, be an n-tuple of elements in S (see Sec. 6). We seek to 
define a mapping from the segment [1, 2] to S satisfying the conditions 
ww «(KW 


Fk +1) = fk) + te forrl<k<n 


50 Rings, integral domains, the integers Ch. 2, Sec. 7 


The mapping principle (1V) of paragraph 7B assures us that there is exactly one 
mapping f satisfying these conditions. We have f(2) = f(1) +x: = xz, f(3) = 
4(2) + ty = xy22%, ete. (we omit parentheses here because of the assumed associa- 
tivity of the binary operation). It is natural to indicate f(m) by x2 + + + 2a, OF 
some similar symbolism. The important point is that the symbol is given an 
unambiguous meaning by this definition. Comparing (7.10) and (7.23) one sees 
immediately that if x,, 22, . . . , #, all happen to be the same element x of S, then 
f(x) = 2”. The new operation therefore includes that of (7.10), and we can write 


cao} w= 2n--- 2 (n factors) 
the right-hand side denoting the quantity xm - + - x, when all the factors are 
equal to x. 


Comparing (7.5) and (7.23) we see that 


7 ont =1-2---m 
the right-hand side denoting the quantity zz - » - 2, defined by (7.28) when x 
is the integer k (k = 1, . . . , m), the operation being multiplication of integers. 


Using mathematical induction it is easy to prove various rules for repeated 
operations. For example, 


7.36 (yt, + + Mn) Cngtnge 0 Bam) = Me + Lem 
for any (x + m)-tuple of elements in S. If the operation is commutative, then 
TAP jh yy = Ts Be 


for any n-tuple of elements, where j,, jo, - . . ,j, denote the integers 1,2,...,7 
in some new order. We omit proofs of these statements. 

Another useful symbolism for the repeated product of elements in S is defined by 
the following equation: 


7.28 = Oks - + + oy 
jel 


In regard to this notation, we point out that the particular choice of the “dummy” 
index j is a matter of indifference. Any other letter, ¢ for example, can be used, 


provided that no confusion is produced thereby [for example, one would not use 
the letters x or n in place of j in (7.28), for obvious reasons]. 


In the case of a binary operation with additive notation +, we naturally write 
M1 tae + +++ +a, instead of xz: - - + xq, and this quantity is also denoted by 
the symbol 


Mathematical induction and some of its applications 51 


The remarks above concerning the dummy index j apply here, too, The additive 
version of Eq. (7.24) is 


neaetepess ta (nm terms) 
In a ring both the symbols x, + a; + +++ +2, and xx, +++ x, are defined 
for any n-tuple x, x2, . . . , Z. in the ring. 


7H. We conclude with a few simple applications to rings. Let A he a 


ting (see Definition 2.1), and let x), a2, . . . , 2, be an n-tuple of elements in A. 
Then 
730 Oy tae ts tay) = ex heme + ++ + + Ota 


for any element ¢ in A. (This is a generalized form of the distributive law.) 
The proof is easily obtained by induction (II). Namely, the equation 


e-(y tes ta) =e + +++ + ex, clearly holds for = 1. If it holds for 
some k (with k <n), then it holds also fork + 1, For 
eG bes tea) se (et te) + tel 
Ser tess +2) t+ ete 
by the distributive law. By assumption the right member is equal to 
(om b+ fer) + exes, which in tum is equal to ex +--+ + exe, by 
Eq. (7.26) for the operation +, The induction principle (IV) then shows that the 
equation c+ (4 + +++ +24) = cx + +++ + ex holds forallk =1,..., 2” 


In particular, it holds for k = , which is (7.80). 

In connection with induction proofs, we mention that they are frequently pre- 
sented with only enough detail to convey the gist of the argument, the remaining 
details being left to the reader, A little experience with induction proofs will show 
how much detail is necessary to make the argument clear. 

We give a useful application of (7.30): Let the unit element of our ring A be 
denoted by the symbol 1, and let 5 be any element of A, n any positive integer. 
Then 


731 (1 -d)1 +b +R + oe) +O) ST — beth 
For, by (7.0), the left-hand side is equal to 
(1b) + — 6) + BBY He + OH by 
=1L—b EDEL DY ABER ee. pt be be HK pet 
=L—pet 

As a final application, let A be an ordered integral domain (see Sec. 3), and let 
a, %, .. . , £, be an n-tuple of elements in A. Then 
maz i tte te tm] Staal t [ek + os + [eal 


The inequality is clearly correct for x = 1, Suppose that it holds for some integer 
nand for all n-tuples x... , 2,. Then it must hold for» + 1. For 


5a Rings, integral domains, the integers Ch. 2, See. 7 


| terete + ten] 
= l@ +++ +20) + tral 
Sle tires + eal + lensl (by Exercise 6, part (d), Sec. 3) 
Slml+ +++ + lex) + lanai] (by assumption) 


The theorem follows by induction (I). 


7J. Proof of Definition by Induction (III) To prove (III) of paragraph 
7B, let U be the set of all positive integers k such that there is one and only one 
mapping, call it f;, from the segment {1, 4] to H satisfying conditions (7.1), (7.2). 
That is, 


fell) = 23 
13s for l <n <k the element f,(m +1) in E is determined by the given rule 
R,, from the elements f.(1), fil2), . . «5 f(r). 


We observe that if k and J are both in U, say with k < i, then 
734 feln) = filn) forl Sask 


for f:, applied only to elements of the shorter segment [1, k], must satisfy the same 
conditions (7.33) as f,, and by assumption those conditions determine f, uniquely. 

Now let T be the set of all positive integers not in U. If T is not empty, then 
it has a least element ko, by Theorem 4.3, We show that this gives a contradiction, 
Clearly k) > 1 because the mapping of the segment [1, 1] to # is uniquely pre- 
scribed by the condition that 1 be sent into z. Then k = ky — 1 is a positive 
integer in U, and so by definition of U there is one and only one mapping f; from 
[1, ] to E satisfying (7.33). We want to show that there is one and only one mapping 
fio from (1, ko] to E satisfying (7.88). Now the segment (1, ko} has only one element 
not already in [1, k], namely, kp = + 1. There cannot be two different mappings 
Siw feo from [1, ko} to E meeting the requirements, for both of them applied to 
integers in [1, k] must satisfy the same conditions as f, and therefore (by the 
uniqueness of f,) must coincide with f, in the shorter segment. That is, fio(n) = 
Sisln) = fin) for 1 <n < k. But then condition (7.88) says that both fi(k + 1) 
and fio(k + 1) are uniquely determined by the rule A; from certain of the elements 
Fe), fe2), » « . , Sule), and 80 fix(k +1) = file + 1), whence fry = fin Hence, 
Je is unique, and it clearly exists, for we have only to put fio(w) = fx(n) for 1 < 
n < k, then determining f,.(k + 1) from those elements by the rule R;. This shows 
that kp is also in U, contradicting the definition of ko. Hence T must be empty, 
andso U = 4. 

Therefore for every positive integer k there is one and only one mapping f, from 
the segment (1, k] to E satisfying (7.33). Now for any positive integer x we define 
f(n) by 


f(n) = alm) 


Mathematical induction and some of its applications 53 


This defines f(x) uniquely for any positive integer x, and so f is a mapping from 

J to #; and f satisfies (7.1) and (7.2), as follows easily from (7.34). Furthermore, 

f is unique, since for the integers in any segment [1, k] it satisfies the same con- 

ditions as f, and must therefore coincide with f; for every n in the segment [1, kl. 

Q.E.D. 

The mapping principle (IV) follows from (III) by simply taking R, to be the 
tule f(x + 1) = zo for x > r, xq being any fixed element of 2. 


EXERCISES 

1. Let x and y be elements of a group (written multiplicatively) such that 
xy = yx. Prove that xy" = y"x" for all integers m, n. 

2. If 2 is any positive integer, then 2n — 1 is the nth odd integer. Prove by 
induction that the sum 1 +3 + --- + (2x — 1) of the first » odd integers is 
equal to n?, What is the sum of the first x even integers (the nth even integer 
being 2n)? 

3. Let B, Ey, ... , &, be n sets, and suppose that £, contains m; elements, 
forj =1,2,...,m. Assuming that no two of the given sets have any elements 
in common, prove that their union (that is, set of all the elements in all the Z;) 
contains 1, +m» + - ++ + mt, elements. (Use the definition of Sec. 6 and the 
result of Exercise 1, Sec. 6.) 

4. Let A be a ring (see Definition 2.1), and let 6 be any element of A. Let B 
denote the subset consisting of the unit element of A and all finite sums of ele- 
ments m-b", where m is any integer and where n is any positive integer (see Z 
and F above). Prove that B is a commutative subring of A (see Definition 2.2). 
(B is called the subring of A generated by b.) 

5, Prove formula (7.31) by induction. 

6. For any positive integer x prove that 


2-QE24+ +++ np =n-@Htl) 
7, Let a and d be elements of a ring A. Prove that 
2-fat+ @+a) +(at2d) +--+ + @ 4nd) = (n+ 1)@a + nd) 
for any positive integer n. 
8. Prove that 
B (HBF RF 0: + On — 1) = nQn — Qn $1) 


for any positive integer x. 

9, Prove that. 

4- (1829437 4 0 -- + n5) = wt 4+ 1)? 

for any positive integer x. 

10. Prove that 

6. +2424 --- +n) = nln + 1)2n +1) 

for any positive integer 7. 

11. 1 +6 > 0 in an ordered integra] domain (1 being the unit element), 
prove that (1 +) > 1 + nb for any positive integer n. 


54 Rings, integral domains, the integers Ch. 2, Sec. 8 


12. If x, 2, ... , 2, are elements of an ordered integral domain such that 


Tl =1 then So [ed 2m 


mt 


13, Show that > si=(@+e)DI—1 


8. Some elementary number theory 


In the following paragraphs we are going to prove some of the basic theorems of 
arithmetic, theorems that are the core of the part of mathematies called elementary 
number theory.f That theory is primarily concerned with questions of divisi- 
bility of integers, and we now define that concept. Remember that we always 
deal with the system Z defined by axioms (1) to (7) of Sec. 4. 


DEFINITION 8:1 An integer m = 0 is said to divide another integer n (or to be a 
divisor of n) if there is an integer q such that n = mg. If that is so, then we write 
m|n and we call q the quotient of n by m (it is uniquely determined, for if also n = 
mg!, then mq = mq’ and so q = q', by the cancellation law). If m does not divide x, 
then we write m kn. 

As 2 trivial example, 2| 6 and the quotient is 8. However 2} 7. 

We make some simple observations concerning the definition. First observe 
that 0 is divisible by any integer m + 0, for m - 0 = 0. In the definition we have 
excluded m = 0, for (if n 0) there is no number ¢ such that » = 0-g.t Sinee 
the number 0 presents some trivia] but annoying exceptions, we shall generally 
exclude it in the following. 

Both 1 and —1 divide every integer n, and if x # 0, then 2 is divisible by » 
and —n, the quotients being 1 and —1, respectively. Some other simple conse- 
quences of the definition are brought out in 


PROPOSITION 8.1 (a) If m|n, then also m|—n, ~m|n, and —m|-~n. (b) If 

k| mand m|n,thenk|n. (c) If m|nand m|n’, then m|(n +n’). (d) Ifm|n, 

then m| kn for any integer k. (e) The integers 1 and —1 are divisible only by 1 and 
-1. (If m'n, then |m| < |r|. (9) If m|n and n|m, then m = +n, 

Proof. All but (e), (f), and (g} follow very trivially from the defini- 

tion; and (e) is an immediate consequence of Exercise 2, Sec. 4. To prove 


+ The term “elementary” does not necessarily mean “easy”; it refers to the nature of proofs 
and simply means that they do not use calculus, whose methods play an important part in 
more advanced investigations of number theory. 

t The student may wonder about the possibility of putting a new element © into the sys- 
tem Z and then defining the quotient of x by 0 to be». ‘There is nothing wrong with doing 
so, and in fact it is sometimes done. However it follows from Theorem 2.2 that it cannot 
be done in such a way as to give aring. In other words, such a definition cannot be made 
without partially destroying the basic algebraic rules in the system. For most purposes 
that is too high a price to pay simply to be able to divide by 0. 


Some elementary number theory 55 


()) we have by definition n = mg, and so |n| = |mg| = |m|-|gl, by Ex- 
ercise 6, part (7), Sec. 3. Now |g| > 1, by Theorem 4.1, and so |m|- |g] > 
|m| (Theorem 3.6). Hence |x| > |m|. 

To prove (9), it follows from (f) that if m |» and | m, then |m| < |n| 
and |r| < |m|. Hence |m| = |n| (by trichotomy), and so m = + 2, by 
definition of absolute value. 


DEFINITION 8.2 An integer n is called a prime number ?f it is not equal to 0, 1, or 
~1 and if its only divisors are +1 and an. 
The first few prime numbers are +2, £3, +5, +7, +11, £18, £17, £19, ete. 
It is the study of prime numbers that is the chief object of number theory. 
Later on we shall mention a few of the more advanced theorems concerning them. 


REMARK. From (a) of Proposition 8.1 it follows that divisibility (or nondivisi- 
bility) of one integer by another is completely unaffected by a change of sign. 
Accordingly we shall usually restrict our attention to positive integers in the fol- 
lowing. From now on when we refer to a prime number we shall always mean a 
positive prime—hence one of the integers 2, 3, 5, 7, 11, 13, ete. All the results 
we shall derive below will be applicable to negative integers, with transparent 
modifications. 


proposition 8.2 Every integer greater than 1 is divisible by a prime number. 
Proof. Let n be an integer greater than 1, and let 7 be the set of all 
integers greater than 1 which divide n. Since x | » it is clear that T is not 
empty, for it certainly contains n. Therefore T has a least element m 
(Theorem 4.3); and m must be prime. For otherwise it would be divisible 
by a smaller integer & > 1, But then k would also divide x (Proposition 
8.1 (b)| and therefore would have to be in T, a contradiction. Hence m 
is prime and m | x. 


THEOREM 8.3 (Euclid) There are infinitely many primes. 

Proof. Let pi, po, ..., pr be any prime numbers. The integer g = 
1+ pip. + - + py is divisible by some prime po, by the theorem above. 
But ¢ is not divisible by the given primes p,,..., p. For suppose, 
for example, that p.|g. Clearly p; also divides pip. + + - p,, and so 
p, must divide g — pip2 + - - p, = 1, by Proposition 8.1 (c). But that 
is impossible, by Proposition 8.1 (e). It follows then that po does not 
appear among the primes pi, . . . , p,-3 therefore no finite list of primes 
can exhaust the set of all primes. 


The following theorem is central to our topic: 


proposition 2.4 (Division algorithm) Le! m and n be any positive integers. Then 
there are two integers q and r such that 


n=qm+r and Osr<m 


56 Rings, integral domains, the integers Ch. 2, Sec. 8 
Moreover q and r are thereby uniquely determined, and q > 0. 


Proof. Let us prove the uniqueness first. Suppose then that there is 
a second pair of integers q’ and r’ satisfying the conditions. That is, 
neodmt+rand0<r' <m. Wehaven =qm+r=¢'m+r’, and 
sor —1’ = (q' —q)+m, which shows that if q = q’, then also r =r’. 
Suppose that q + q', say q' > @ for definiteness. Then g’ —¢ >1 
(Theorem 4.1), and so (q/ — q)-m > m (Theorem 3.6). From above 
there follows r — 7’ > m, or r > m +7’ > m (Theorem 3.5), a contra- 
diction. 

To show that g and r exist as claimed, let T be the set of all positive 
integers & for which km > x. T is not empty, for n + 1 itself is in 7. 
Hence T contains a least element ky (Theorem 4.8). Accordingly we 
have kym > n, but (ky —1)-m <n, Now define q = ko —1. These 
last inequalities say (¢ +1)-m > n, but am <n. If we subtract gm 
from both sides of these inequalities, we get m > — mg and 0 <x» — 
mq, by Theorem 3.5. Now define r = n — mg, and we are done, 


Remark. The uniquely determined numbers g and r are usually called quotient 
and remainder, respectively. Observe that to say m divides n means simply that 
rt = 0, Assimple numerical examples, if x = 7 and m = 3, theng = 2andr = 1; 
ifn = 7 and m = 8, then g = 0 andr = 7. The division algorithm is of course 
nothing but ordinary division with remainder, such as one learns in elementary 
school. The point here is that it has been stated precisely and has been deduced 
from the axioms characterizing the system Z. The innocuous appearance of the 
theorem belies its far-reaching consequences, which we now begin to develop. 


DEFINITION 8,3 Let m and n be any two integers different from 0, An integer d is 
called the greatest common divisor (g.c.d.) of mand n if 

(yd>o 

(2) d divides m and n 

(3) Any integer which divides both m and n also divides d 


More generally, if ms, no, . . . , m, are integers different from 0, then an integer d is 
called their greatest common divisor if 

ajyd>o0 

(2') d divides m, m2, ... , ”, 

(3’) Any integer which divides m1, m2, ... , 2, also divides d 


examee 2 is the g.c.d. of 4 and 6; 3 is the g.c.d. of 6, 12, and 21; 1 is the g.c.d. 
of 4 and 15, 


REMARK. Suppose that d and d’ are two greatest common divisors of m and n. 
Then by (2) we have did’ and also d’\d. Consequently d’ = +d, by Proposition 
8.1(¢). From condition (1} we conclude that d’ = d. Therefore the g.c.d. of two 
numbers (we show below that it always exists) is unique. We shall denote the 


Some elementary number theory 57 


g.c.d. of m and x (# 0) by the symbol (m, ”). From the definition it is readily 
verified that the following hold: 


(me, 2) > 0, (m, 2) = (n,m) = (—m, n) = (m, —n) = (—m, —n); 
a1 Qn, 2) <|m| and (m, 2) < |x| 
The symbol (mz, m, . . . , ”,-) is similarly used to represent the g.c.d. of nonzero 
integers m1, to, ... , #, (see Exercise 2 below). 


THeorem 8.5 (Euclidean algorithm) Any two positive integers m and n have a unique 
greatest common divisor d = (m,n). Moreover d can be caleulated by the following 
method of successive divisions: Write av for one of the given numbers m, n and a 
for the other. Using the division algorithm (Proposition 8.4), divide ay by ay, getting 
Qo = Midi + de, G2 being the remainder. If a2 = 0, then d = a1; if ay = 0, then 
divide a, by a2, getting say a, = gods + as, where a; is the remainder. If a; = 0, 
then d = a3; if as = 0, then divide ay by a5, getting a2 = qaas + a, and 30 on, The 
number d is equal lo the last nonzero remainder oblained in this way. 
Proof. The uniqueness of d has already been shown. To prove that 
it exists and that the above process gives it, let us write the successive 
divisions down in an orderly fashion: 


ao = Gm + ae 
a = G22 +45 


82 a2 = stg + ay 


Ws = Ge-2i-2 + Oe 
M2 = Fase + 
Ga = ae 


First of all from the division algorithm we have a, < a1, a3 < da, a4 < a3, 
etc. The successive remainders therefore decrease steadily, and it follows 
that the process must lead to a zero remainder after a finite number of 
steps—in fact after at most a, steps. We have assumed above that a 
zero remainder is first obtained on the kth step. We want to show that 
the last nonzero remainder, namely a,, is the g.c.d. of the given integers 
a, and a,. From the division algorithm all the remainders (including ax) 
are positive, and so condition (1) of Definition 8.3 is clearly satisfied. 
Now from the last equation above we have aja, Therefore from 
the next to last equation we get a;|a;—2, because a, divides both terms on 
the right. From the equation above, we conclude similarly that @,|a,—;. 
Continuing up the list we find that a, divides all the preceding a’s—in 
particular, @) and a. Hence condition (2) of Definition 8.3 is satisfied 
by ay. To check condition (3), let b be any common divisor of ay and a. 
From the first equation of (8.2) it follows that blac, Then, from the second 
equation, b|a3 because b divides both a, and ag, Therefore from the third 


Rings, integral domains, the integers Ch, 2, See. & 


equation we find that bja,. Continuing down the list we conclude that 
b divides all the a’s—in particular a,. This shows that condition (8) is 
verified, and so a, = (ao, a1). 


exampce i Find the g.c.d. of 1426 and 343. Here we put ao = 1426 and a = 
343. The calculations are 


1426 = 4 - 343 + 54 
343 = 6- 54 4 19 
54=2-19 +16 
19=1-16 +3 
@=5-3 +1 
B=3-1 +0 


Hence (1426, 343) = 1. 


exampce 2 Find (12, 148). The calculations are 


and so 


148 = 12-1244 
12= 8:4 40 


(12, 148) = 4 


REMARK. The proof of Theorem 8.5 contains some concealed uses of mathe- 
matical induction. Can you rewrite the proof in such a way as to put them in 
evidence? 


From the euclidean algorithm there follows a very important corollary: 


THEOREM 8.6 If mand n are any positive integers, and if d is their g.c.d., then there 
exist two integers r and s such thal d = rm + sn. 


Proof. Referring to Eq. (8.2) we see that d = a, ean be expressed as 
a combination of as, a2. Namely, 
d = Og — Ges Get 
From the equation immediately preceding, namely, ax_3 = i—2 * @r_2 + 
Oya, we have ay. = Ge_a — Qe + Qs_x, and go 
d= aia — G1 (Ges — G2 + Gea) 
(L + Gk + Gea) + Oka = Gea Gee 


Continuing up the list in this way we can express ¢,_; as a combination of 
@y_s and ay_s, and then we can express a;_; a8 a combination of as 
and a,_s, and so on until we finally arrive at an expression for d as a 
combination of a and a. This is illustrated in the examples below. 


Remark, The integers r and s of Theorem 8.6 are not unique. In fact infinitely 


many pairs will work. 


Some elementary number theory 59 


examete 3 Referring to Example 1, we have 1 = 16 — 5. 3and3 = 19 —1-16, 
whence 1 = 16 — 5- (19 — 16) = 6-16 —5-19. From the third equation of 
Example 1, 16 = 54 — 2-19, andsol = 6- (54 —2-19) —5-19,0rl = 6-54 — 
17-19. From the second equation, 19 = 348 — 6-54, and so 1 = 6-54 — 
17. (348 — 6-54) = 108-54 — 17-343. From the first equation, 54 = 1426 — 
4-343, and so 1 = 108- (1426 — 4-843) — 17-343 = 108-1426 — 449 - 843. 
Therefore we can take r = 108 and s = —449, 


EXAMPLE 4 Referring now to Example 2, we need only use the first equation, 
getting 4 = 148 — 12-12. In this example then we can taker = lands = —12. 


DEFINITION 8.4 Two integers m and n, different from 0, are said fo be relatively 
prime, or coprime, if their only common divisors are 1 and —1. 


PROPOSITION 8.7 Two nonzero integers m and n are relatively prime if and only if 
(m,n) = 1. If mand n are relatively prime, then there are integers r and s such 
that rm + sn = 1. 

Proof. If m and are relatively prime, according to Definition 8.4, 
then 1 is clearly their g.c.d., because 1 satisfies the conditions of Defini- 
tion 8.3. Now suppose conversely that 1 is the g.c.d. of m and x. Let 
4 be any common divisor of m and x. By condition (3) of Definition 8.3 
the number 6 must then divide 1. By Proposition 8.1 (e) the only divisors 
of 1 are +1, andso 6 = +1. The last assertion of our theorem follows 
from Theorem 8.6. 


examece The integers 1426 and 343 are coprime, by Example 1. 


THEOREM 2.8 Let p be a prime, and let m and n be two nonzero integers such that 
plmn. Then plm or pln. 
Proof. It clearly suffices to consider the case m > 0 and» > 0. Sup- 
pose then that p | m. We must show that pln. Now if p does not divide 
m, then m and p are coprime, because by definition p is divisible only by 
+1 and +p. Hence by Proposition 8.7 there are integers r and s such 
that rp + sm = 1. Multiply this by x: rpn + smn =n. Now p clearly 
divides the first term on the left, and by assumption it divides also the 
second term. It therefore divides their sum, which is x. 


corottany Let m,n, . . . , m, be integers different from zero, and let p be a prime 
which divides nny +++ ny. Then p must divide one of the factors m, ne... Mr 
The proof is left as an exercise. 
We now come to our main result: 


THEorem 8.3 (Fundamental Theorem of Arithmetic) Any integer n greater than 1 
can be expressed as a product 


a3 n= Pprs + + Dr 


60 —_ Rings, integral domains, the integers Ch. 2, Sec. 8 


of positive prime numbers, and this expression is unique apart from the order of the 
factors. 

Proof. We first show that every integer x > 1 can be expressed in at 
least one way as a product of primes. To do so let T be the set of all 
integers > 1 that cannot be expressed as a product of a certain number 
of primes. We want to show that Tis empty. If T is not empty, then it 
contains a least integer to. Now np cannot he prime, for a prime number 
is already expressed in the required form (with just one factor of course). 
Now no is divisible by some prime p, by Proposition 8.2, say o = p+ m. 
Since p > 1, we have m < mo. Hence is not in the set T and can there 
fore be expressed as a product of primes, say mt = qi - ++. But 
then = p+ qg: + - + q, which contradicts our assumption that mp can- 
not be so expressed. We conclude therefore that T is empty. 

It remains to show that the expression (8.3) is unique (apart from the 
order of the factors). The proof is similar to the foregoing: If the as- 
sertion is false, then there is a least integer n > 1 for which it is false. 


For that there must then be two essentially different expressions of 
the type (8.3), say 


a4 R= Pipes ++ Be = 2-5 + Ge 
Since piln we have pil(qigi +» - ge). Therefore by the corollary to Theo- 
rem 8.8 it follows that p, must divide one of the primes 91, Qn - . - + Ge 


By rearranging the order of the factors on the right we can assume that 
pig But then p: = a since they are both (positive) primes. We can 
therefore cancel p, = 4, on both sides above, getting 


Peps ++ Pr = GaGa + + Gs 


This integer is smaller than n, because p; > 1. But it is also greater 
than 1, for otherwise the original Eq. (8.4) would have to be simply p, = 


q:;. Now by assumption the theorem is true for the integer pips - - - py = 
4293 + + + qs, and therefore we must have r = s and the primes py ps, 
+» Br must coineide with qz, go, . . . . ge in some order, It follows 


that the two sides of (8.4) must be the same except for the order of the 
factors, and so the theorem is true for x. This contradicts our assump- 
tion concerning n and shows that the set of integers x > 1 for which the 
theorem is false must be empty. Q.E.D. 


Remark. The expression (8.3) is called the prime decomposition or factorization 
of xn. The primes pi, p,, .. - , pr are said to be “contained in xn’ or to be prime 
factors of n. In general a prime may appear more than once in the prime de- 
composition of an integer n. If we collect the repeated primes together, then we 
can rewrite (8.3) in the form 


Some elementary number theory 61 
n= pip? +++ pet 

where pi, pa, - - Ps are now distinct primes and where the exponents ¢, ¢ 

++ «4 € are positive integers. Simple examples are 3780 = 2? 3°. 5-7 

and 1728 = 3 . 2°, 


EXERCISES 


1. Find the g.c.d. of the following pairs of integers and find integers r and s of 
Theorem 8.6 for each: 


(a) 10824, 146 (6) 129, 27 
(c) 3423, 21 (d) 1560, —125 
(e) —78, -22 
2. Prove that the g.c.d. of any finite set of nonzero integers m1, 2, ..., 
always exists, and give a method for finding it. Show that the g.c.d. can always be 
expressed as a combination arm: + aitz + ++ - + @)n,, Where a, @2,... 5 a 


are integers. Work this out for the following numerical cases: 
(a) 215, 15, 825 
(b) 1460, 122, 55, 12 
(ce) 17850, 1700, 2000, 55, 20 

3. Let m and x be nonzero integers, and let U be the set of all integers of the 
form am + bn, where a and are arbitrary integers. Show that U must contain 
positive elements, and show that the least positive integer in U is precisely (m, 7). 
Show how this can be generalized for the case of r given nonzero integers m, n», 

4. Give the prime decompositions of the following integers: 1472, 176, 18365, 
124648, 127. 

5. Prove that if a | be and if a and } are relatively prime, then a | c (here @, 6, ¢ 
stand for positive integers). 

6, Prove the corollary to Theorem 8.8. 

7. The least common multiple (l.c.m.) of two integers m, » different from 0 is de- 
fined to be an integer / such that 

(a) i>. 

(6) m|tand x |i 

(c) If mand n both divide an integer hk, then also ! divides h. 
Prove that the l.c.m. is unique and always exists. Assuming m, » positive, show 
that d-! = m- x, where d and l are their g.c.d. and l.c.m., respectively. Show 
how to determine the g.c.d. and l.c.m. from the prime decompositions of m and 2. 
Generalize this to the case of more than two integers. 

8. Let » be any positive integer, and let y(n) stand for the number of positive 
integers <x which are relatively prime to xn. Thus g(1) = 1, (2) = 1, o(4) = 2, 
ete, [p(x) is called the Euler function.) Calculate g(7), g(12), ¢(16), (27), o(35). 
For any prime p show that g(p‘) = p''(p — 1), where e is any positive integer. 


62 Rings, integral domains, the integers Ch. 2, See. 9 


*9, Let m and x be relatively prime positive integers. Show that g(m - 2) = 
e(m) + g(x), where ¢ is defined in Example 8. Conclude from this that if » has the 


prime decomposition 2 = p"'po* - > - p,”, with distinct primes py, p2, . . . . Dry 
then 
ln) = pst se pts (py — We — WI) se - 1) 
10. Find the g.c.d. (py'po? + + + pa, pi'tpa pit), where Di, Da. + > Dn 


are distinct primes and where the exponents are non-negative integers. 


9. Notation for integers 


Everyone is familiar with the usual notation for integers. In this paragraph we 
examine it briefly from the standpoint of what has been done in the foregoing 
sections. First let us recall that in ordinary numerical notation the symbol 67403 
(for example) stands for 


3840-10 44-107 + 7-10 + 6-10 


and in general if ao, a, . . . , a represent integers between 0 and 9 (inclusive), 
then the symbol a,a,1 + + + @sqido (we do not mean their product here!) stands for 
ae + a WO + ay 10? + 6 Fa 10 + a, 10" 


Our purpose here is to understand the grounds for this mode of writing integers 
and to show that any integer 6 > 1 (instead of 10) can be used as a base for such 
2 means of expressing integers. 

The division algorithm (Proposition 8.4) affords the simplest approach to our 
problem. With minor modifications, the same method will be used in Chap. 8 to 
produce decimal expansions of fractions. We begin with two propositions which 
give us estimates on the magnitudes of expressions of the kind just mentioned. 


PROPOSITION 9.1 Let 5 be any integer. Then 
O-YLEb ER Ee by cot 
for any integer r > 0. 


Proof. The assertion follows at once from the identity (7.31): 
(-DA +b +H +. Hoy eb A 1 
and from 6+! — 1 < b+! (‘Theorem 8.5). 


corotwany If > 1, then bY > 1 fort > 0. 
Proof. Since 1 <b —1, by assumption and Theorem 4.2 we have 
¢ $ (6 —1)-c for any positive integer c. Taking e =1+8 46% + 

+ ++ 8° we obtain from this and Proposition 9.1 


LHe totes perc ort 


Now every term on the left is >1, and there are r + 1 terms. It follows 
readily that the left-hand side here is greater than or equal tor +1. We 


Notation for integers 68 


have therefore r +1 <8'+! for r > 0. This proves the assertion of the 
corollary for r > 1, and the assertion is trivial for r = 0. QED. 


Proposition 3.2 Let b be an integer greater than 1, and let ao, di, a2, - . . , a; be 
integers between 0 and 6 — 1 inclusive, with a, 0, Put 


m =o +ab + ad? +--+ + a,b" 
Then bo Sm <a. 


Proof. Since all the numbers involved here are >0, it follows that 
do tab + +++ + a,b! > 0. Therefore by definition of < we have 
m 2 a,b". We have assumed that a, > 1 and so a,b" > 6° (Theorem 8.6), 
whence m > 6”. This proves the first inequality. 

We now have also assumed that a, <b — 1, and so a,b! < (b — 1) - BF 


fork =0,1,...,7. Hence 

ms b-D+6-Ddo+o-e +... + 6-1 
or 

med—-Y)-d+o+h +--+ +o) cot 


by Proposition 9.1. Q.5.D. 
We now come to the main theorem of this section. 


tHeoRem 9.3 Let m be a positive integer, and let b be an integer greater than 1, Then 


there are unique integers ao, a, . . . , de such that 
oa Mm =A + ab + +++ + ab? and 
" O<a;<b-1  forj=0,...,7, witha, #0 


Proof. Divide m by 6, according to the division algorithm (Proposi- 
tion 8.4). The result is m = gob + do, where g is the quotient and a, is 
the remainder. Next divide q. by 8, say q = qb + as, where q, is the 
quotient and a; the remainder. Then divide q by 8, obtaining q = qb + 
a», and so on, In this way we define (definition by induction!) two se- 


quences of numbers qo, gi, Gz, - - « and dy, a, a2, . . . connected by the 
equations 
m = Gob + a 
g = 1b + a 
a = Hb + a 
O20 eee eee 
e—-1 = Ub + ae 


64 


93 


Rings, integral domains, the integers Ch. 2, Sec. 9 


Now substitute for g, in the first equation from the second. We get 
m = d + ab + qb. Substituting for q: from the third equation we get 
m = dy + ab + ab? + gob’ Substituting from the fourth equation for 
@» and so on (we are again using induction here), we obtain after k such 
steps the result 


m = ao + aid + arb? + oe + fag bho! + ab + gibht! 


Observe first that m > g,b*+?, since all the terms on the right are > 0. 
But bE!) > & +1 (Corollary to Prop. 9.1), and go we have m > qu(k + 1). 
This shows that q = Ofork > m. Of course some of the earlier g's may 
also be zero. Let 4, be the first quotient which is zero. Then (9.3), with 
k replaced by r, is precisely in the required form (9.1), and all the a’s lie 
between 0 and 6 — 1 (inclusive), by the division algorithm. Moreover, 
a, # 0, since a, = gif r > 0, and q-1 * 0, by definition of r. (Ifr = 
0, then aa = m, by (9.2).) 

It remains to show the uniqueness of the expression (9.1). Suppose 
then that m = a) + a/b + ajb? + - - - + a%b* is another such expression 
for m, with 0 <a! <b forj=1,..., 8. Then clearly aj is the re- 
mainder obtained upon division of m by 6, and so aj = ae, by the unique- 
ness part of the division algorithm. And the quotient, namely ai + 
ash +--+ + ajb*"', must be the same as q in (9.2). The remainder 
upon division by b is ai, and this must be the same as the remainder upon 
division of q by 6. That is, aj = a, Continuing in this way, one finds 
by a simple induction that the coefficients of like powers of } in the two 
expressions for m must be equal. @.E.D. 


DEFINITION 3.1 The expression (9.1) for a positive integer m is called the b-adic ex- 
pansion of m. 


REMARK. The b-adic expansion looks quite similar to the expressions in the first 
paragraph of this section. It should in fact be clear that the usual notation for 
integers is an abbreviated form of the 10-adic expansion (Example 1 below). The 
proof of Theorem 9.3 contains an effective method for computing the b-adic ex- 
pansion of a given number, as the following examples show. 


ExameLe i For m = 159 and 6 = 10 the equations (9.2) are 


159 = 15-1049 
15 = 1-1045 
1= 06-1041 

ete. 


all the q's here being zero beyond gi. We find then, using (9.3) with k = 2,159 = 
945-10 +1- 10°, and this, as one might expect, is the 10-adic expansion of 159. 


Notation for integers 65 
EXAMPLE2 Take m = 159,b = 4. The equations (9.2) are 


159 = 89-4438 


39 = 9-443 
$= 2-441 
2= 0-442 


the further equations of (9.2) being all the same as the last one here. The 4-adiec 
expansion of 159 is therefore 159 = 8+ 3-4 +1-4° 42-43 


exampce 3 Take m = 159,56 = 2. The equations (9.2) are 


159 = 79-241 


19 = 39-241 
39 = 19-241 
19 = 9-241 
9= 4-241 
4= 2-240 
2= 1-240 
1= 0-141 


with all further equations in (9.2) the same as the last one. The 2-adic expansion 
of 159 is therefore 159 = 1 + 2 + 2% + 2? + 24 4 27. 


Naturally it is possible to use the same kind of positional notation for any 6- 
adic expansion that we use for the ordinary 10-adic ease. For example, the 4- 
adic expansion of m = 159 can be abbreviated by the symbol 2133, by Example 2 
above. Similarly its 2-adic expansion can be indicated by 10011111, as follows 
from Example 3. 

In order to make calculations in some b-adic system of notation one must know 
the addition and multiplication tables for the integers 0, 1, . 6 — 1 (these 
are the “digits” for the b-adic system). For 6 = 5 we have five digits, which we 
call 0, 1, 2, 3,4. The tables for this case are as follows (omitting 0 from the ad- 
dition table and both 0 and 1 from the multiplication table): 


5-ADIC ADDITION TASLE 5-ADIC MULTIPLICATION TABLE 
1 2 3 4 2 3 4 

1 2 3 4 10 2 4 it 13 

2 4 10 1 3 14 22 

3 at 12 4 aL 

4 13 


(Observe that in any b-adic system the symbol 10 always denotes the base number 
2.) Two sample 5-adic calculations are given below: 


66 Rings, integral domains, the integers Ch. 2, Sec. 10 


4182 143 
$2124 x 240 
113ii 12320 
341 
10142 


In recent years the 2-adic system (usually called the binary system) has come 
into great favor because it has only two digits, 0 and 1, and they can be repre- 
sented by the “on” and “off” positions of some sort of electrical switch, making 
the binary system particularly suitable for digital computers, In ancient times 
the Babylonians used both the 10-adic and 60-adie (sexagesimal) systems, the 
former for the average citizen. Observe that the sexagesimal system requires 
60 digits, and the usual symbols cannot be used beyond 9, because 10 in that 
system means 60, 11 means 61, and so on. In this book we use only the 10-adie 
notation, except for the examples and exercises here, 


EXERCISES 
1. Find the 2-, 3-, 7-, and 12-adic expansions of the (10-adic) numbers 12, 
130, 876. 
2, What integers are represented by the 9-adic symbols 17840 and 4031? 
3. Write out the addition and multiplication tables for the 3-, 7-, and 12-adie 
systems. 
4, Derive the rules for working out d-adic sums and products (i.e., the rules for 
“carrying” digits in b-adic additions and multiplications). 
5. Understanding the symbols 1431 and 265 as 7-adic symbols, work out 
1431 1431 
and 
+265 x 265 


in the 7-adic system. Do the same, interpreting the given symbols as 12-adic. 
Check your results by putting everything in the 10-adic system. 


10. More elementary number theory: congruences 
The starting point for this paragraph is the following. 


DEFINITION 10.1 Lei 1 be a positive integer, and let a and b be any two integers. Then 
a is said fo be congruent to b modulo m if a — 6 is divisible by m. If that is so, then we 
write a = 6 (mod m). Ifa — b is not divisible by m, then we write a # b (mod m). 


examptes For m = 6 we have 2 = 8 (mod 6), 1 = 7 (mod 6), —2 = 4 (mod 6), 
48 = 0 (mod 6), 128 = 8 (mod 6), ete. 
If m divides a — b, then by Definition 8.1 we have a — b = km for some integer 


k, Hence, a = b (mod m) if and only if a = 6 + km. All the integers that are 


More elementary number theory: congruences 67 


congruent modulo m to a given integer b are obtained by adding multiples of m to 6. 
For example, the integers that are congruent to 2 (mod 6) are the integers 2 + 6k, 
where & is arbitrary. Those integers are 


+ —22, —16, —10, —4, 2, 8 14, 20, 26, 32, . . . 


The notion of congruence was first studied systematically by Gauss (1777-1855). 
Congruence has many properties in common with ordinary equality, as the follow- 
ing theorem shows. 


Proposition 19.4 Le! m be a positive integer. (i) If a and b are integers such that 
a = (mod m), then b = a (mod m). If also = c (mod m), then a = ¢ (mod m). 
(ii) If a = a (mod m) and 6 = b! (mod m), then a +b =a’ +’ (mod m) and 
a—b=a' —8' (mod m) and ab = a’b' (mod m). (iii) If a = 6 (mod m), then 
a* (mod m) for any positive integer k. (iv) If ac = be (mod m) and if c and m 
are relatively prime, then a = 6 (mod m). 

Proof. Parts (i) and (ii) are trivial. For example to show that ab = 
a’ (mod m) in (ii), we have by hypothesis a’ = a + km because 
m|a’ —a, and similarly bf = 6 + Im, where & and 1 are integers. Then 
a! = (a + km)(b + Im) = ab + m- (al + bk + Im), which shows that 
a’! ~ ab is divisible by m. 

We prove (iii) by induction. The assertion at = d (mod m) holds for 
k = 1, by hypothesis. Assuming that it holds for some positive integer k, 
we show that it must hold also for k + 1, hence for all positive integers. 
If at = b¢ (mod m), then a-at = b- 6! (mod m) by the last part of (ii), 
proved above. That is, a+! = 5! (mod m). 

To prove (iv) we must show that m | (a — 6). Now by assumption m 
divides ac — bc = (a — 5)-c, if ac = be (mod m). Since m and ¢ are 
relatively prime, we have m | (a — h), by the result of Exercise 5, Sec. 8. 

The theorem shows that = can be treated like = except for cancellation. 
Part (iv) states that cancellation of a common factor in @ congruence is 
permissible provided that factor is relatively prime to m. Cancellation may 
not be permissible otherwise. For example, 3-11 = 3-7 (mod 6) be- 
cause 6 divides 8-11 — 8-7 = 3-4 = 12; but 11 #7 (mod 6). 


@ 


PROPOSITION 10.2 Let m be a positive integer, Then any integer n is congruent 


(mod m) to one and only one of the integers 0,1,2,...,m—1. 
Proof. Let n be any integer. We show first that n cannot be congruent 
(mod m) to two different integers a, bin the segment 0,1,2,..., m—1. 


For if 2 = @ (mod m) and x =} (mod m), then also a = 8 (mod m), by 
Proposition 10.1 (i) above. Supposing that a > b, we haved <a—b < 
m — 1, and it is therefore impossible for m to divide a — b [Proposition 
&1(/)], a contradiction. 


68 Rings, integral domains, the integers Ch. 2, Sec. 10 


If now n = 0, then obviously n = 0 (mod m). If n > 0, then by the 
division algorithm we have » = qm +1, where 0 <1 <m — 1 (Proposi- 
tion 8.4), and son = r (mod m). Finally, ifm < 0, then n + km > 0 for 
some k (k = —n + 1, for example), and by what has just been shown 
n + km is congruent (mod m) to some integer 0,1, .. . , m — 1, say r. 
Since n = x + km (mod m), it follows that x = r (med m).  q.E.D. 


Proposition 10.2 tells us that the integers fall into m different classes—each class 
consisting of all the integers that are congruent modulo m to one of the integers 
0,1,2,..., m—1. These classes are called residue classes modulo m. The 
integers in a given residue class differ from each other by multiples of m—that is, 
they are congruent (mod m), Integers in different residue classes are not congruent 
(mod m). 
exampLe For m = 4 there are four residue classes, and they are 
—20, —16, —12, —8, —4, 0,4, 8 12, 16, 20,... 

—19, -15, —11, —7, -3, 1,5, 9, 18,17, 21, ... 
+ —18, -14, -10, 6, ~2, 2, 6, 10, 14, 18, 22,... 
+ -17, -18, — 9, —5, -1, 8 7, 11, 15, 19, 23, .. 


The usual way of counting hours on an ordinary clock amounts to reckoning 
modulo 12, A clock does not indicate a specific hour; it indicates a residue class 
modulo 12. The residue classes modulo 2 are simply the classes of even and odd 
integers. 


DEFINITION 10.2 Let m be a positive integer. A set of m integers ci, en . ~~ 5 Om 18 
called a complete set of residues modulo m if that set contains one inieger from each of 
the m residue classes modulo m. 


exameLe By Proposition 10.2 the integers 0, 1,2, .. . , m — 1 forma complete 
set of residues (mod m). If we add m, say, to each one, then we get another com- 
plete set of residues (mod m), namely, m, m-+1,m+2,...,2m—1. For 


m = 4 the following are all complete sets of residues, as is easily seen from the ex- 
ample above: 0, 1, 2,3; 4,5,6,7; —4, -8, —2, -1; —20, 5, 14, —13, ete. 


PROPOSITION 10.3 Lei m be a positive integer. Then a set of m integers c1, 02, . . - + 
tm forms a complete set of residues modulo m if and only if no two of them are congruent 
(mod m). 
Proof. lft, ¢:, ... , mn do form a complete set of residues, then by 
definition they are congruent modulo m, in some order, to the integers 
0,1, 2,..., m —1, and therefore no two of the c’s can be congruent 
(mod m) because no two of the integers 0,1, ... , m — 1 are congruent 
(mod m). 
Conversely, suppose that no two of the c’s are congruent (mod m). By 
Proposition 10.2 each of the ¢’s is congruent (mod m) to exactly one of the 


More elementary number theory: congeuences 69 


integers 0, 1,2, .. . , m — 1; and two different c's eannot be congruent to 
the same one of these (mod m), for then those two c’s would be congruent 
to each other (mod m). The assertion follows at once. 


PROPOSITION 10.4 Let 1 be a positive integer, and let a and b be any integers, with a 
relatively prime to m. Then if x runs through a complete set of residues (mod nt), 30 
does az +b. In other words, if 1, tz . .. , €n form a complete set of residues 
(mod m), then so do acy + 6, aco +b, . 1+ + Om +d. 
Proof. From Proposition 10.3 we have only to show that no two of the 
latter are congruent (mod m). Suppose to the contrary that ae; + b 
ac; + (mod m), with ij. Then ec; = ac; (mod m), by Proposition 
10.1 (i). Since (a, m) = 1 by hypothesis, we can cancel the a, by Proposi- 
tion 10.1 (iv), getting c, = ¢; (mod m)}, which contradicts the assumption 
that ¢1, ¢4, . . . , ¢m form a complete set of residues (mod m). 


exampce Take m = 4,a = 7,6 = —2. If we let x run through the complete set 
of residues 0, 1, 2, 3, then ax +b = 7x — 2 runs through —2, 5, 12, 19, which is 
also a complete set of residues. 


THEOREM 19.5 (Fermat) Let p bea prime number, and let a be any integer not divisible 
by p. Phen a?! = 1 (mod p). 

Proof. Consider the complete set of residues 0, 1, 2,..., p—1 

modulo p. By Proposition 10,4 the integers 0-a, 1-a,2-a,..., 

(p — 1)- aalso form a complete set of residues modulo p. Therefore each 

of the first set of numbers is congruent (mod p) to one of the second set, 

and conversely. Since 0 appears in both sets, it follows that each of the 


numbers 1, 2,..., p — 1 is congruent (mod p) to exactly one of the 
numbers 1- a, 2-a,..., (p —1)-4, and vice versa. By repeated 
application of Proposition 10.1 (ii) we conclude that1-2--- (p-1l)= 
(1. a)(2-a)- +» (p —1)- a) (mod p), or (p — Disa! (p—-1)! 


(mod p). Since (p ~ 1)! and p are relatively prime, we can cancel 
(p — 1)!, by Proposition 10.1 (iv), obtaining 1 = a’—! (mod p). 


exampre Takep = 7,a = 2. Our first complete set of residues is 0, 1,2, 3, 4, 5, 6. 
The second set is 0-2, 1-2,2-2,..., 6-2, 0r0, 2, 4, 6,8, 10, 12. We have 
1=8,2=2,3 = 10,4 =4,5 = 12,6 = 6, all modulo 7, and sol -2-8-4.5-6 
-4-6-8-10-12 (mod 7), whenee 6!=2°-6!, or 1=2 (mod 7). 
This is easily verified, for 2° = 64 = 9-7 +1 = 1 (mod 7). 


coro.tary § For any integer a and any prime p, a” = a (mod p). 
Proof. If p ta, then this follows from the theorem above. If p|a, 
then both sides are congruent to 0 (mod p), hence are congruent to each 
other (mod p). 


We mention here a somewhat more difficult theorem. Its proof will be given in 
Chap. 6. 


70 Rings, integral domains, the integers Ch. 2, See. 10 


THEOREM 106 (Wilson) For any prime p we have (p — 1)! = —1 (mod p). 
exampte For p= 7, (p — 1)! = 6! = 720 = —1 (mod 7), For p =2 
(p — 1)! = 1, and the assertion is correct, for 1 = —1 (mod 2). 


Fermat’s theorem has an important generalization. To get at it we recall the 
Euler function v(m) defined in Exercise 8, Sec. 8. (m) is defined for any positive 
integer m to be the number of integers in the list 1, 2, 3. - ,m which are rela- 
tively prime to m. (m is prime to itself only when m = 1.) For example, if p is 
prime, then g(p) = p — 1, because 1,2, ... , p — 1 are all relatively prime to p. 
It is possible to give a fairly simple formula for g(m) in general (see Exercise 9, 
Sec. 8), but that need not concern us here. We now make 


DEFINITION 10,3 Let m be a positive integer. A set of r integers t, C2, . . . Cy i8 
called a reduced set of residues modulo m if 
(1) No two of the e’s are congruent (mod m). 
(2) Each c; ts relatively prime to m. 
(3) r = elm). 
The proofs of the following two theorems parallel very closely those of Proposi- 
tion 10.4 and Theorem 10.5 and are relegated to the exercises. 


PROPOSITION 10.7 Let m be a positive integer, and let a be any integer which is rela- 
tively prime tom. If 1,2, ... , ¢, forma reduced set of residues modulo m, then so 


do acy, aes, . . . , Qty. 


tHeonem 188 (Fermat) Let m be a positive integer, and let a be an integer relatively 
prime tom. Then a*™ = 1 (mod m). 

‘As noted ahove, if p is a prime, then y(p) = p — 1, and so Theorem 10.5 is a 
special case of Theorem 10.8. 


ExampLe Take m= 12,4 = 5, Then g(12) = 4, and so a*™ = 5! = 625 =1 
(mod 12). 


We now turn our attention to the following problem: Given integers ¢ and } and 
a positive integer m, can we solve the congruence 


ax = b (mod m)? 


That is, can we find an integer x which makes this true? It is certainly not always 
possible, for 7x = 1 (mod 7) has no solution. Indeed the left-hand vide is always 
congruent to zero (mod 7), whatever x, The congruence 22 = 1 (mod 6) has no 
solution, for 6 cannot divide 2x — 1 for any choice of x. 

On the other hand, the congruence 4z = 3 (mod 5) hasa solution. For example, 
x = 2works,andsodoz = ,x = 12,2 = 1%,z = 22,ete, The following theorem 
gives an answer to the problem under fairly general circumstances. It is not diffi- 
cult to analyze the cases not covered by the theorem, but we shall not do so here. 


More elementary number theory: congruences n 


rweorEM 10.9 Let a, b, m be integers, with m > 0 and (a, m) = 1. Then the con- 
gruence ax = b (mod m) always has a solution «. Any two solutions are congruent 
(mod m), and conversely if x is a solution, so is any integer which is congruent to x 
(mod m). If m isa prime number, then x = a"-% is a solution. 

Proof. If (a, m) = 1, 28 assumed, then there are integers r and s such 
that ra + sm = 1, by Proposition 8.7. Multiplying by & we get bra + 
dsm = b, whence bra = (mod m). Putting x = br we have then az = b 
(mod m), as required. 

Suppose now that z and z’ are two solutions to the problem. Then 
az = 6 (mod m) and az’ = b (mod m), and so ax = ax’ (mod m). There- 
fore z = x! (mod m), by Proposition 10.1 (iv). Conversely, if z is a solu- 
tion of the congruence, and if « = a (mod m), then ax = aa’ (mod m). 
By assumption ax = b (mod m), and so we get ax’ = b (mod m), showing 
that z’ is also a solution. 

Finally, if m is prime, then a" = 1 (mod m) by Fermat’s theorem, and 
so a- a2. b (mod m), showing that x = a"-*® is a solution. 


REMARK. From the generalized form of Fermat’s theorem (Theorem 10.8) it 
follows that « = a*™—'b is a solution of the congruence even if m is not prime. 


exampLe Solve 343 = 15 (mod 1426). From Example 8, See. 8, we have 
108- 1426 — 449. 343 = 1, and so 15- 108- 1426 — 15- 449. 843 = 15, whence 
15. 449-343 = 15 (mod 1426). One solution is therefore z = —15- 449 = 
-6735. To this we can add any multiple of 1426. 

Solve 4a = 5 (mod 7). Here m = 7 is prime, and so one solution is x = 5-45 = 
5.1024 = 5120. Any number congruent to 5120 (mod 7) is also @ solution— for 
example, its remainder upon division by 7, namely, 8. This problem can also be 
solved of course by the method involving the determination of r und s such that 
r-44+8-T=1. For example, we can take r = —5 and s = 3. Then from 
-5-44+3-7=1weget 5-5-4 45-8-7 =5,s0 that —25- 4 = 5 (mod 7), 
showing that —26 is a solution. 


Remark. From Proposition 10.7 we see that a congruence az = (mod m), with 
a and m relatively prime, does not determine just one value of a; it determines 
an entire residue class modulo m. One solution z (and only one) will always be 
found among the integers 0, 1,2, . . . , m — 1, and sometimes simple trial and 
error is the swiftest way to solve a congruence when m is small. 


We have considered here only one kind of problem involving solution of con- 
gruences. Other kinds also arise. As a simple example we might consider two 
simultaneous congruences. 


ax + by = ¢ (mod m) 
az + b'y =c' (mod m) 


72 Rings, integral domains, the integers Ch. 2, See, 10 


for two unknowns x and y. Another type of problem arises if we try to solve two 
simultaneous congruences 


ax = b (mod m) 
Bi (mod m’) 


ay 
involving one unknown and two different moduli m and m’. Generalizations of this 
sort are not difficult to handle, but we shall not go into them here. 
A deeper and much more difficult problem arises if the unknown x appears to a 
higher power than the first. Examples of such congruences are 


32? +22 +4 = 0 (mod 5) 
and 
a + 2x" — 8 +2 = 0 (mod 7) 


ete. The integer 2 = 4 is a solution of the first [as well as any integer congruent to 
4 (mod 5). But in general it is not possible to find integers which solve such con- 
gruences. We shall return to this topic brieffy at the end of this chapter and then 
again later in Chap. 6. 


We now return to residue classes. Let m bea fixed positive integer. As we have 
seen, all the integers then fall into m different residue classes modulo m. Integers in 
different residue classes are nof congruent. modulo m; integers in the same residue 
class are congruent modulo m. Each residue class contains exactly one of the 
integers 0,1,2, ..., m — 1. Let us denote the corresponding residue classes by 
0,7, 7 ™ We are going to regard these classes as elements of a new 
set, which we call Z,,. Thus Z,, consists of exactly m elements, each of which is a 
residue class of integers modulo m. 

In the set Z,, we proceed to define two binary operations as follows: Let @ and 6 
be two elements of Z,,. Select in the class @ a representative integer a, and select in 
the class 5 a representative integer b. We then define 


wa a +85 = residue class (mod m) containing a + 6 
ab residue class (mod m) containing ab 


First we observe that if we select different representatives a’ and b’ in the two 
residue classes @ and 5, respectively, then a = a’ (mod m) and 6 = 6’ (mod m). 
Hence a’ + 6’ = a + 6 (mod m) and a’b’ = ab (mod m). Therefore a’ + 6’ and 
a +b are in the same residue class (mod m), and consequently the definition of 
a + 5 above does not depend upon the particular choice of a and b. This same is 
true for ab. 


REMARK. Observe that the meaning of the symbols 0, 1, 2, ete., depends upon m. 


examece Take m = 5. Then Z; consists of five elements 0, 1 4. By defini- 


tion these elements are residue classes (mod 5). Specifically, 


More elementary number theory: congruences. 73 


0 consists of all multiples 5% of 5 (k any integer) 
T consists of all integers 1 + 5k 
2 consists of all integers 2 + 5k 
3 consists of all integers 3 +-5¢ 
4 consists of all integers 4 + 5k 


Let us now apply (10.1) to this example. To compute 2 + 3 we select an integer 
in each of those classes, say 2 in 2 and 8 in 3. We add these numbers, getting 2 + 
8 = 10. Since 10 = 0 (mod 5), the residue class containing 2 + 8is). Hence, by 
definition, 2 + 3 = 6. To compute 2- 3 we multiply the representatives 2 and 8, 
getting 16, of course. Since 16 = 1 (mod 5) we have 2-3 = I, by (10.1). If we 
had chosen other representative integers, e.g. —3 in 2 and 3 in 3, we would have 
got the same result. The complete addition and multiplication tables for Z; are 


given below. 


ADDITION IN Zs. MULTIPLICATION IN 2; 

0 I 2 8 4 6 I 2 8 4 
o}/o t 2 38 @ 6/6 6 6 6 4 
I] i 2 8 4 @ i|}o i &@ 8 @ 
2/2 38 @¢@ 6 I 2/60 2 4 Tf 8 
3/3 4 6 T 2 3|/0 38 Tf 4 2 
4,4 0 i 2 38 7,0 4 3 2 i 


From the first table we observe that 0 is the identity element for addition in Zs. 
Since 1 + 4 = 6, we can write 4 = ~I, or 1 = —4. Similarly 2 = —3 and3 = 
—2, It is not hard to verify that the first table is the table of an abelian group. 
From the second table we see that i is the identity element for multiplication. 


THEOREM 10.10 Let m be a positive integer, and let Zn be the set whose elements are 
the residue classes of integers modulo m. Then Zn contains m elements. If addition 
and multiplication are defined in Zp, by (10.1), then Zm 28 a commutative ring. For 
m > 1 the ring Zn is an integral domain if and only if m is a prime, and when that is 
30, every element of Zn except 0 has an inverse for multiplication. That is, if a #0, 
then there is an element 5 such that a5 = I. 

Proof. The verification that the ring axioms are satisfied is very 


straightforward and will be left as an exercise. One simply observes that 
first of all Z itself satisfies the ring axioms. By (10.1) calculations in Zn 
are essentially calculations in Z ignoring multiples of m. Thus one easily 
sees that 0 is the identity for addition in Z,,. The inverse (for addition) of 
iis m ~ 1, for 3 the inverse is m ~ 2, for 3 the inverse is m — 8, ete. In 
other words, ~I = m—1, ~2=m-—2, -3 = m—3, ete. The ele- 
ment I is the identity for multiplication. 

If m = 1, then Z, contains only one element. It is trivial to see that Z, 


% Rings, integral domains, the integers Ch. 2, See. 10 


satisfies the axioms for an integral domain. Suppose now that m is not 
prime. Then there exist two positive integers a, b smaller than m such 
that ab = m. If we let @ and 5 denote the residue classes containing e and 
b, respectively, then we have ab = 6, by (10.1). For according to (10.1), 
ab is the residue class containing ab = m, and that class is0. We conclude 
that Z,, cannot be an integral domain, by Theorem 2.4. 

To prove that Z,, is an integral domain if m is a prime, we first prove 
the last part of the theorem—namely, that every element of Z, ex- 
cept 0 has an inverse for multiplication. To see this let 2 be an element 
of Z,, different from 0, and choose a representative integer a in the class d. 
Then a # 0 (mod m), and since m is prime it follows that (a, m) = 1. 
Consequently there exists an integer 4 such that ad = 1 (mod m), by Theo- 
rem 10.9. Let 5 be the residue class containing 6. Then by (10.1) we have 
ab = 1, as required. It is now easy to see that Z,, is an integral domain. 
According to Theorem 2.4 we have only to show that if @ and é are any two 
elements of Z,. different from 6, then a¢ ~ 0. To show this let 5 be, as 
above, the inverse of a. Thatis,a5 = 1. If wehad az = 0, then multiply- 
ing by 5 on both sides we get baé = 80, or I@ = 0, or finally ¢ = 6, con- 
tradicting the assumption that? ~ 0. @.E.D. 


SOME CONCLUDING REMARKS In this section and the preceding we have given only 
some of the most basic theorems of number theory. The reader who desires to 
pursue this beautiful topic further is referred to one of the standard treatises, for 
example, Hardy and Wright, “The Theory of Numbers,” Oxford (1938). 

As was said earlier, the prime numbers are the chief objects of number theory. 
We now mention briefly some of the more advanced aspects of the subject. 

Let p be any prime number, and consider the congruence 


ax? + 2hz + ¢ = 0 (mod p) 


where a, b, ¢ are given integers. The problem is to determine whether there is an 
integer x which makes this true. First of all we can suppose that a # 0 (mod 7), 
for otherwise the first term could be struck out, reducing the problem to one already 
considered. Multiplying the congruence by a, we observe that it ean be written 
(or + b)? = bt — ae (mod p). If we put y = az + and ec’ = bt — ae, then this 
becomes y? = ¢’ (mod p). Therefore it suffices for the problem at hand to consider 
congruences of the simpler type 


10.2 2? = ¢ (mod p) 


It is clear that if z is a solution of this congruence, then so is any integer x" which 
is congruent to z (mod p). Furthermore (10.2) is essentially unchanged if we re- 
place ¢ by any integer ¢’ congruent toc (mod p). Thus, if we want to do so, we can 
restrict ourselves to the complete set of residues 0, 1,2, . . . , p — 1 modulo p. 


More elementary number theory: congruences 75 


In what follows we shall exclude the trivial case ¢ = 0 (mod p), for which (10.2) al- 
ways has the solution « = any multiple of p (and those are the only solutions). 
We shall also exclude the ease p = 2, and so p will always be odd (hence p — 1 
even). 
If we let x run through 1, 2, 8,..., » — 1, then z? runs through 1, 2%, 3%, 
., P—2*% Since —1 = p — 1 (mod p), we have 1 = (p — 1)? (mod 7p). 
Similarly, —2 = p — 2 (mod p), and so 2? = (p — 2)° (mod p), and so on. It 
follows that only half the integers 1,2, . . . , p — 1 can be congruent (mod p) to 
the square of an integer. [For example, if p = 7, then squares of 1, 2, 3, 4, 5, 6 are 
congruent (mod 7) to 1, 4, 2, 2, 4, 1, respectively. Therefore, for p = 7, the con- 
gruence (10.2) has a solution only if ¢ is congruent (mod 7) to 1, 2, or 4] 
In general, if (10.2) has a solution, then c is called a quadratic residue (mod 7), 


and this is indicated by writing @) = 1. If (10.2) has no solution, then ¢ is 
called a quadratic nonresidue (mod p), and this is indicated by writing 6) =-1L 
The symbol G) defined in this way (remember we assume that p /c) is called 


the Legendre symbol. 


If we let ¢ run through 1, 2,..., p —1, then as we have just observed, 
@) = +1 for half of those values and (=) = —1 for the other half. The prob- 


lem is, which half are quadratic residues (+1) and which half are quadratie non- 
residues (—1)? For any particular value of p it is very easy to find out, as we did 


above for p = 7. Our results there show that @) =1, ¢) =1, @) - 


@) =-1, @) =-1, © = —1. It is however not so easy to prove general 
results. 

Legendre (1752-1833) discovered a very remarkable kind of reciprocity con- 
necting quadratic residues for pairs of primes greater than 2. It can he stated 
briefly as follows. Let p and q be two primes greater than 2. Since they are odd 
we can write p — 1 = 2k and g — 1 = 21, where k and! are integers. ‘Then the 
celebrated law of quadratic reciprocity states simply that 


(=) 


For example, if p = 5,q = 7, then @) = —1, as noted above, and @)- -1,as 


is easily verified. In this case k = 2,1 = 3. (The reader will find it a profitable 
exercise to write out the law of quadratic reciprocity in words.) 

Legendre did not succeed in proving this simply stated but profound connection 
between pairs of primes, The first proof was given by Gauss at the age of nineteen. 
It was published in 1801 in his classical treatise Disquisitiones arithmeticae. 


76 Rings, integral domains, the integers Ch, 2, Sec. 10 


The way in which the prime numbers are distributed among the integers has 
long been an area of great mathematical interest. We conclide with two theorems 
concerning this question. 

For any positive integer x let us define 


10.3 a(x} = number of primes not exceeding « 


Thus x(1) = 0, (2) = 1, 7(8) = 2, w(4) = 2, ete. 

Then x(x) is approximated by x/log x for large values of x, in the sense that the ratio 
of the two quantities tends to 1 as x increases indefinitely (here log x stands for the 
natural logarithm of 2). This result is known as the prime-number theorem. It was 
first proved in 1896 by Hadamard and de la Vallée Poussin. 

Finally we mention the following theorem of Dirichlet (1805-1859): If a is posi- 
tive and if a and b are relatively prime integers, then there are infinitely many primes of 
the form an + b, where n stands for an arbitrary positive integer. 


EXERCISES 


1. What integers are congruent 
(a) To 5 (mod 7)? 
(0) To 0 (mod 8)? 
(e) To —4 (mod 8)? 
2. Give three different examples showing that ac = be (mod m) does not always 
imply that a = b (mod m). 
3. Give two different complete sets of residues modulo m for m = 6, m = 11, 
m = 12. For each of these values of m give three different reduced sets of residues. 
4, Verify Fermat’s theorem a?-! = 1 (mod p) fora = 4,p = 1, and fora = 6, 
18. Verify the generalized Fermat theorem a = 1 (mod m) for a = 4, 
15. Show that the latter does not hold for a = 10 and explain why. 
5. Let m be a positive integer, and let a be any integer. Let C stand for one of 
the m residue classes modulo m. Prove that if one integer in C is relatively prime to 


p 
m 


‘m, then every integer in C is relatively prime to m. 

6. Prove Theorem 10.7. 

7. Prove Theorem 10.8. [The proof is very similar to the proof of Theorem 10.5; 
one has merely to start out with a reduced set of residues (mod m)—in fact that is 
what we really used in Theorem 10.5, for when p is prime the numbers 1,2, .. . , 
Pp — 1 form a reduced set of residues.] 

8. Solve the following congruences: 


(a) 8 = 7 (mod 4) (6) —22 = 12 (mod 11) 
(c) 2 — 6 = 0 (mod 12) (d) 872 = 18 (mod 5) 
(e) 10x — 146 = 0 (mod 12) (f) 27002 = —12 (mod 17) 


(g) 42 — 12 = 0 (mod 11) 
Show that 3 = 7 (mod 15) has no solution. 


Proof of Theorem 4.4 7 


9. Let m be an odd integer. Prove that m? = 1 (mod 4). 

10. Write out the verifications that were omitted in the proof of Theorem 10.10. 

11. Write out the addition and multiplication tables for Z, and Zs. 

12, Prove that the only solutions of the congruence x? = 0 (mod m) are mul- 
tiples of m, when m is prime. Show by an example that this need not be the case 
when m is not prime. 

*13. Let p be a prime> 2. Show that if the congruence z? = ¢ (mod p) has 
any solutions, then all possible solutions fill up precisely two residue classes (mod 
p). Show by an example that this need not be the case if p is not prime. 

14. Find a solution of the simultaneous congruences 

8x = 5 (mod 7) 
2x = 12 (mod 9) 

Describe the totality of all possible solutions. 

*15. Let p be a prime greater than 2 and put p — 1 = 2k. Let ¢ be any integer 
not divisible by p. Prove that (ct — 1)(¢* + 1) = 0 (mod p) and deduce from this 


that either c= 1 (mod p) or ct = —1 (mod p). Prove that if @) = 1, then 


ct = 1 (mod p), where G) is the Legendre symbol defined earlier. It can be 


shown that, conversely, if ct = 1 (mod p), then @) = L. Consequently, () ac 


(mod p). 
16, If the integer N is written as a,10* + - -- + ao10° in decimal notation, 
and S =a, +--+ +4) is the sum of its digits, show that N = S (mod 9). 


Use this fact to develop a fast method of partially verifying the accuracy of com- 
plicated arithmetical operations (the so-called “rule of nines’). 

17. Show that 6 
pute the last digit in the decimal expansion of the integers 671, 41%, 

18. Find the last digit in the decimal expansions of 3** and 7! 

19. Find the remainder of 4" upon division by 7. 

20. Show that the sum of the third powers of three consecutive integers is 
divisible by 9. 

21. Prove that the integers 2°" + 1,2 = 0,1,2,..., are relatively prime in 
pairs; i.e., no two such integers have a (non-trivial) common factor. [Hint: If 
a| 2? +1, then 2% = -1 (mod a)}. 


6 (mod 10) for any positive integer x. Use this fact to com- 


11. Proof of Theorem 4.4 


THEOREM 44 Let Z and Z be two systems satisfying axioms (1) to (7). Then there 
is one and only one ting-isomorphism of Z to Z. This isomorphism makes the zero 
and unit elemenis of Z and Z correspond, and furthermore it preserves order. 
Proof. By a ring-isomorphism we mean, of course, a ring-homo- 
morphism (Definition 2.2) which is also a one-to-one correspondence. 


78 


Rings, integral domains, the integers Ch. 2, See. it 


As we saw in Sec. 2, such 2 homomorphism, call it t, satisfies (0) = 0, 


i(—a) = —t(a), ta — b) = ta) — 4b), and also (Exercise 5, See. 2) 
i(1) = I where 0, 1 and 6, I denote, respectively, the zero and unit ele- 
ments of Z and Z. Since ¢ is one-to-one, the inverse mapping {-! from Z 
to Z is also defined and is one-to-one. 

If a ring-isomorphism ¢ of Z to Z exists, we claim that the elements of 
J and J (the set of positive elements of Z) correspond. Indeed, let U be 
the set of all elements x in J such that {() isin J. Since ¢(1) = I and I 
isin J, by Theorem 3.8, it follows that 1 isin U. Let x be any element of 
U. Then i(x +1) = i(z) + (1) = tz) + 1 which is in J by axiom (6), 
since (() and 1 are in J. Hence x +1 is in U, and so by axiom (7), 
Sec. 4, U = J, that is, for every x in J, i(x) is in J. Applying the same 
argument to {-!, we see that every element of J corresponds to an ele- 
ment of J and ¢ establishes a one-to-one correspondence between J and 
J. Now let a, b be elements of Z such that a <b, Then b —aisin J. 
and therefore 4(b — a) = ¢(b) — t(a) isin J. This means that é(@) < ¢(b). 
This is what we mean when we say that ¢ preserves order or that it is 
compatible with the orders in Z and Z. 

We have thus seen that, if an isomorphism exists, it has all the proper- 
ties stated in the theorem. We now show that there cannot be two differ- 
ent isomorphisms. The final step of the proof will have to show of course 
that an isomorphism does exist. 

Let then ¢ and ¢’ be two isomorphisms of Z to Z. We know that é(1) = 
i(1) =1. Let U denote the set of all elements x in J such that (x) = 
"(z). Then 1 isin U. Let 2 be any element of U, that is, t(c) = ’(z). 
Then d(x +1) = d(x) + (1) = ¢(@) +0) = U@ +1) and so x + 1 is 
in U. Thus, by axiom (7), U = J and t(x) = t’(x) for all 2 in J, that is, 
tand ¢’ are the same mapping for elements of J. Let us compare ¢ and ¢’ 
for the other elements of Z. Clearly #(0) = (0) since they are both 
equal to the zero element of Z. Now let x be an element of J’. Then —x 
is in J and by the above t(—x) = f(—z). Hence f(z) = ~{~z) = 
= f'(x), and we conclude that and ¢’ are the same isomorphism 


The final step will consist of three parts. First we construct a mapping 
of Z to Z, then show that it is a ring-homomorphism; i-e., it satisfies the 
two equations of Definition 2.2; and finally we show that the mapping 
is @ one-to-one correspondence. 

As we have seen we have no choice but to put ¢(0) = 9 and (1) = 1. 
We then define ¢ for elements of J by induction, using the formula 
Ur +1) =t(z) + I. Thatis, the two conditions (1) = land t(x +1) = 
(x) + [for x > 1 determine a unique mapping ¢ from J to Z, by Sec. 7 B. 
To define t(x) for elements x of J’ we put {(z) = —t(—2), the right-hand 


Proof of Theorem 4.4 9 


side being already defined since —z is in J. Note now that the equations 
t(#) = —t(—a) and t(x +1) = é(z) + lare true for alfxin Z. We need 
prove only the first for x > 0 and the second for x < 0 [since the other 
cases are the definitions of U(x) for x < 0 and x > 0, respectively]. In 
the first instance, since x > 0, then —x < Oand{(—2) = —(—(-2)) = 
—t(z); so {(2) = -t(-2). In the second instance, if x = 0, we have 
Ux +1) =100 41) = t(1) = 1 = 40) +1, since 10) Tf x <0, 
then x+1<0, and so —x—1>0. Therefore, from above, 
(—2z —1+41) =t(-2 -D +1, or (-2) =4-2 -—-1)4+i. From 
what was just shown, this is the same as ~t(z) = —t(x +1) +1, or 
ix +1) = tz) +1. 

We now verify that the mapping ¢, defined in the preceding paragraph, 
satisfies t( + y) = f(x) + f(y) for all x, y in Z. The verification of 
t(zy) = t(x)t(y) proceeds along similar lines and is left as an exercise. 
We first prove t(x + y) = (x) + ty) for x + y in J, using axiom (7). 
Let U be the set of all elements z in J such that #(@ + y) = tx) + ty) 
for all x, y in Z such that x + y =z 

We claim 1 is in U. Indeed, if x + y = 1, then « = —y +1, f(x) = 
ty $1) = Uy) +1 = ~ty) + Land t@) + ty) = 7 = U1) 
iz+y) Let now z be in U anda +y=24+1. Thenzt+y 
(@-—-+y+tland @-1)+y=2 Thus, t(@ + y) = «(re -1) 
yti=t@-Ltw tite -)+ty +i=t@-1 +i 
ty) = t(@) + Uy), so that z + lisin U, and U = J. 

The equation t(z + y) = t(z) + ¢(y) must still be proved forz + y < 
Ifs ty =0, then = —y, U2) = t(—y) = —H(y) and tm) +t) = 
Hexcty <0, then ~(@ +y¥) = (~2) + (-y) >0, K-G@+y)) = 
t(—2) + (-y)) = (2) + U-¥) = —i(x) — ty) and f(z + y) 
-H-(@ +9) = —(-t(@) ~ ty) =U) + ty). 

We now prove that ¢ is a one-to-one correspondence. According to the 
definition (See. 2, Chap. 1) we must prove: 

(@) For every & in Z, there is an element x in Z such that i(z) = £. 

(b) If x, y are elements of Z such that ¢(x) = t(y), then x = y. 

To prove (a), let us first assume & is in J. Let U be the set of all ele- 
ments @ in J such that there is an element x in Z with t(x) = %. Since 
#1) = 1, Lis in U. Let now 7 be in U, so that there is an element x 
in J with (2) = 4 Then (x +1) =i) +1 =2+4T] and? +T]isin 
U. Hence U = J and all elements of J correspond to some element of Z. 
Let now # <6. If @ = 0, then @ = {(0) and (@) is satisfied. Tf z < 0, 
then —# > 0 and there is an element y in Z with (y) = ~Z. Then = 
—(-#) = -t(y) = 4(—y) and (a) is completely verified. 

The proof of (b) is very similar to that of (a) and is left as an exercise. 
This finishes the proof of Theorem 4.4, 


tt 


oe 


80 Rings, integral domains, the integers Ch. 2, See. 11 
EXERCISES 

1. Prove in detail that if ¢ is a ring-isomorphism of Z to Z, then its inverse "1 is 
itself a ring-isomorphism. 

2, Prove in detail that f(xy) = t(x)t(y) for all x, yin Z. [Hint: Use the equation 
zy — x = x(y ~ 1) and induction on one of the factors of zy.] 

3. Prove in detail that if 2, y are elements of Z such that ¢(x) = i(y), then 2 = 
y. (Hint: Use the equation {(z) + 1 = x +1) and use induetion on the element 
ix) of Z.] 


Fields, the rational numbers 


1. Introduction 


From Sec. 8, Chap. 2, we recall that an integer a is said to divide an integer 6 if 
there is a third integer x such that ar = 6. That is so only for very special choices 
of a and b, For example, there is no integer x such that 82 = 2. The present 
chapter is concerned with enlarging the system of integers Z in such a way that the 
equation az = 5 will always have a solution, provided that a # 0. 

In the case of 8 = 2 the “answer” is of course z = 24, and more generally the 
solution of az = b is well known to be b/a. Hence our problem is simply that of 
introducing fractions. The difficulty is that many fractions represent the same 
number. For example, 34, 195, —4/—6, ete., all denote the same number. 
What then is the number itself? A fraetion being only a particular name for a 
number, it is not entirely obvious that it is possible to have a system of numbers 
representable by fractions and enjoying reasonable algebraic properties. 

Just as in the case of the system of integers Z, we shall not try to say just what 
rational numbers “are.” We shall instead lay down very simple axioms that 
characterize the rational-number system completely, in the same sense that 
axioms (1) to (7), Sec, 4, Chap. 2, determine the system Z. We begin with some 
more general considerations which will be useful to us in connection with other 
mathematical systems. 


2, Fields 


We recall from Chap. 2 that an integral domain is a set of elements, say D, with 
two associative and commutative binary operations (usually called addition and 
multiplication and denoted, respectively, by a -+ and ab for typical elements 
a, bof D). D with the operation + is required to be an abelian group, and the 
operations are required to satisfy the distributive law a(b + ¢) = ab + ae and the 
caneellation law: if ae = b¢ and = 0, then a = b. Finally D is supposed to con- 
tain an identity element for multiplication. We shall denote it by the symbol 1, 
noting however that it may have no connection with the integer 1. 


DEFINITION 2.1. A field K is an integral domain, containing more than one element, 
such that any element of K other than zero has an inverse with respect to multiplication, 


82 Fields, the rational numbers Ch. 8, See. 2 


Thus if a isin K and a = 0, then there must be in K an element (we denote it as 
usual by a-!) such that aa~! = 1. From Theorems 3.1 and 4.3, Chap. 1, it follows 
that the unit element 1 of K is unique and the inverse a~! of any element a is 
unique. In particular we have 1-! = 1 and (a~!)-! = a for any element a # 0, 
and (ab)! = a~1b—' for any elements a, b * 0. We observe furthermore that it is 
not necessary to assume the cancellation law for multiplication in K because it is a 
necessary consequence of the existence of inverses, by Theorem 4.4, Chap. 1. 
From Theorem 2.4, Chap. 2, we recall that a product ad in K cannot be zero unless 
@=0orb=0. 


examere From Sec. 10, Chap. 2 (Theorem 10.10}, it follows that the set of residue 
classes Z, of integers modulo a prime number p is a field containing p elements. 


Recall that subtraction in an abelian group is defined by a ~ = a + (—b), 
where —b is the inverse of b for addition. In an entirely analogous way we can 
define division in a field K. That is, if a and } are elements of K, with 6 = 0, then 
we put 


2a = abo (b #0) 


a 
ty 
a 
é 
operation is neither commutative nor associative in general.) Taking a or 6 to be 1 
in (2,1) and recalling that 1-' = 1, we get 


(Sometimes = is written a/b or a ~ 6 and is called the quotient of @ by 6; the 


From (2.1) we have further (ab-) = a(b) = a, and e(a/b) = e(ab-) = 
(ca) - -!, and we get the elementary rules 


2s 


proposition 22 [fa and} are elements of a field K, with b x 0, then the element of 
K denoted by a/b is the unique solution of the equation bx = a. 
Proof. tis a solution of the equation, by (2.2), and it is unique by the 
cancellation law. 


PROPOSITION 2.2 Lei a, b, a’, b’ be elements of a field K, with b and b' not zero. 
Then a/b = a’/b! if and only if ab! = a'b; and that és so if and only if there is an ele- 
ment c # 0 in K such that a’ = ca.and b! = cb. 

Proof. By definition a/b = a’/’ means ab“! = a'b!-'. Multiplying 
by bb’ and keeping in mind that multiplication in a field is commutative, 
we get ab’ =a’b, Conversely, if ab’ = ab, then multiplying by 6—1b/—! 
yields a/b = a /¥'. 

Now define c = 8//b. Then cb = b’, by Eq. (2.2), ande #0, Ifab! = 


Fields 83 
q’b, then multiplying by b-! we get ab’b-? = a’. But b’b—! = ¢, and go we 
have a’ = ca. Conversely, if a’ = ca and b’ = eb, then a’b = cab and 
ab’ = cab, whence ab’ = a’b. Q.B.D. 


PRoposITION 2.3 Leta, b, ¢, d be elements of a field K, with b and d not zero. Then 


a, ¢_ ad+be 


a e ¢ 
ata7 bd bv ad-~ be 80a 


BS ng (Sind 
bd” be a) 


Proof, First note that bd ~ 0 because b ~ 0 and d # 0 (Theorem 2.4, 
Chap. 2). To prove the first equality we have, using (2.1), (2.2), (2.3), 


aie a € 
ad ( +§) abd 5 + bd-S 
b d 

= da; + be 5 = ad + be 


But also 


‘ad + be) _ 
od-( is *) = ad + be 


Therefore both sides of the first equation are solutions of bdx = ad + be. 
They are therefore equal, by Proposition 2.1; the other equations follow 
from similar arguments and their proof is left as an exercise. 


DEFINITION 22 Let A be a subset of a field K. If A, equipped with the operations of 
K, isaring, then A is called a subring of K; if @ subring A happens to be a field, then 
it is called a subfield of K. 


EXERCISES 

1. Show that the nonzero elements of a field K, with multiplication as binary 
operation, form an abelian group. 

2. Show that the zero and unit elements of a subring A of a field K are the same 
as those of K, provided that A contains more than one element. Show that if A 
is a subfield of K, then the inverse of a nonzero element in A (for either multipli- 
cation or addition) is the same as its inverse in K. Prove that a subring of a field 
is an integral domain. 

3. Ifa, 8, ¢, d are elements of a field with a/8 = ¢/d, prove the following: 


bod 
(yeas ) 
ey ote @ 
a+d a 
©) ar y= 


84 Fields, the rational numbers Ch. 8, Sec. 2 


assuming of course that the various denominators are not zero. 


41 =S2 =... =% ima field K, show that 
i be é, 
eaa® + coat + 
eink Feb bead BF 
where ¢1, cz, . . . , cx are arbitrary elements of K, not all zero, and where & is a 
positive integer. 

*5. Let D be an integral domain containing only a finite number of elements. 
Prove that Disa field. [Hint: For any a # 0 in D consider the elements a, a’, a’, 
ete.] 

*6, Let a be an element of a field K,a ~ 0. In Sec. 7, Chap. 2, it was shown 
how to define the element na of K for any integer n; for example, 8a = a + 
a +, ete. Now suppose that na = 0 for some integer n # 0. Prove that the 
smallest positive integer for which na = 0 must be a prime, say p, the same for all 
nonzero elements of K. (Hint: Consider the case a = 1 = unit element of K.] 
The prime p is called the characteristic of K. If n-1 s 0 for all positive integers 
n, then K is said to have characteristic zero. 

7. Let A be a subring of a field K, containing more than one element. Let F 
be the subset of K consisting of all elements a/b, where a and 6 are in A and b x 0. 
Show that F is a subfield of K. 

8. Write out the rules corresponding to (2.2), (2.8) for the operation +. Do 
the same for the rules (a/b) + (¢/d) = ad/be and (e/d)-! = d/e. 

9. Consider the following array 


+ ean _ art 


ABC 
BCA 
CAB 


This is a so-called Latin square: every letter A, B, C occurs once in every row and 
once in every column. Such squares are extremely useful in the design of statistical 
experiments. Suppose, as a specific example, that one wanted to study the relative 
yield of n kinds of seed. The simple-minded way would be to subdivide a field in 
n parcels and plant one kind of seed in each. The objection to this technique is 
that the fertility of the soil might well vary from point to point in the field, thus 
altering the results. If, however, one used a square field and subdivided it into 
n? squares, labeled by Ai, Av, . . . , An according to an x X # Latin square, and 
planted the ith kind of seed in each square labeled A,, then the effect of any varia- 
tion in the soil would be greatly diminished. 
Consider now a second array 


rae 
sao 
aoe 


The field of rational numbers 85 


This is again a Latin square, and if we combine it with the first one we obtain the 
following array: 


Aa Bb Ce 
Be Ca Ab 
Cb Ac Ba 


Note that in this combined array every possible combination of a capital and a 
lower-case letter occurs once exactly, We say that the two Latin squares are 
orthogonal. Such orthogonal Latin squares enable one to study simultaneously 
(and therefore economically) the effect of two different types of factors, each » 
in number (for example, seed and fertilizer), Unfortunately orthogonal  X 7 
Latin squares do not exist for every integer n. A particularly simple method of 
constructing orthogonal Latin squares is the following: 

Let K be @ finite field containing n elements, z» = 0, 21, t, .. . , 2-1 Then 
the addition table of K yields an x X m array T, whose (i, j) entryt is x; + 2, 
Let 1 <& <m-—1 and consider the array T; whose (i, j) entry is xu; + x). 
Prove that T; is 2 Latin square and that T's, T; (k # 1) are orthogonal. Construct 
four 5 X 53 Latin squares, which are pairwise-orthogonal. Can this procedure 
yield 6 X 6 Latin squares? 


3. The field of rational numbers 


The following theorem states in effect that an integral domain can always be em- 
bedded in a field in an essentially unique way. 


THEOREM 31 Given an integral domain D containing more than one element, there 
exists a field K containing D as a subring and such that every element of K can be 
expressed ag the quotient of two elements of D, K is called the field of quotients of D. 
If K’ is a second field satisfying the same conditions, then there is one and only one 
isomorphism from K to K’ which sends elements of D into themselves, 

The existence of such a field K is proved in Chap. 15. The last part of the 
theorem saya that K is essentially unique. The proof that there is a unique 
isomorphism as stated is very simple. We wish to define a one-to-one mapping f 
from K to K’ such that f(a) = @ for any a in D and such that f(a + 6) = f(a) + 
f(b) and fiab) = f(a) - f(b) for any a, b in K (ef. Sec. 2, Chap. 2). Now the theo- 
rem says that any element ¢ of K can be expressed in the forme = a/b, where a, 6 
are in D and of course 6 # 0. The same is true for K’. Now we simply map the 
element ¢ of K into the element c’ of K’ represented by the same symbol a/b. 
Using Proposition 2.2 it is easy to see that the result is independent of the par- 
ticular representation of ¢ as a quotient of elements of D and that the mapping 
has all the required properties. 


t Let the horizontal lines be called rows and the vertical ones columas; then the “(é, }) entry” 
is the symbol located at the intersection of the ith row and jth column. 


86 Fields, the rational numbers. Ch. 8, See. 3 


Thus K and K’ are interchangeable. It is convenient to think of D as having 
attached to it a particular field of quotients, and for that reason we speak of the 
field of quotients of D. 

We now apply Theorem 8.1 to the integral domain Z: 


DEFINITION 3.1 The field of quotients of the system of integers Z is called the field of 
rational numbers and is denoted by Q. The elements of Q are called rational num- 
bers. 

Thus Z is embedded in a larger set Q which is a field, and every rational number 
r, that is, element of Q, can be expressed (in many ways) as the quotient r = a/b 
of two integers, 


Theorem 3.2 If K ts a field containing the integers Z as a subring, then K contains 
@ subfield F which can be identified with Q—that is, is isomorphic to Q in one and 
only one way. 
Proof. Since K is a field, every integer a ~ 0 has an inverse a—! in K. 
Let F be the set of all elements a/b in K, where a, are integers, with 
6 0, Then Fis a subfield of K (cf. Exercise 7, Sec. 2). Since F satisfies 
the conditions of Theorem 3.1, it follows from that theorem that there is 
one and only one isomorphism from F to Q which sends elements of Z into 
themselves. 
But in fact there can be no other isomorphisms at all. That is, there 
can be no isomorphism f from F to Q which does not send every integer 
into itself. For f(1) = 1, by Exercise 5, Sec. 2, Chap. 2. Then f(2) 
#1 +1) =f) + fA) = 2, ete. By induction we deduce that f(x) = 
for every positive integer n. By (2.1), Chap. 2, wehavef(—n) = —f(n) = 
—n, and so f also sends negative integers into themselves. Q.E.D. 


The theorem says in substance that any field K containing Z as a subring (ie, 
the operations in Z are the same as in K) must contain a “copy” of the rational 
number field Q. In this sense, Q@ can be thought of as the smallest field containing 
Z. 

The following proposition asserts the possibility of “reducing a fraction to lowest 
terms."” 


PROPOSITION 3.3 Any rational number r x 0 can be expressed in one and only one 
way as a quotient r = m/'n, with n > 0 and m, n relatively prime. 

Proof. By definition of @ the number r can be expressed as the quotient 
of two integers r = a/b, with a,b #0. Wecan assume 6 > 0, for other- 
wise we have only to replace a,b by —a, —b. Now letd = (a,b) = ged. 
of a and b, Then we can write @ = md and b = nd. Since b, d are 
positive, we have n > 0, and r =a/b = md/nd = m/n, where now 
(m,n) = 1, The proof of uniqueness is left as an exercise. 


The field of rational numbers 87 


Z is an ordered integral domain, and we now show that Q inherits an ordering 
from Z, 


DEFINITION 3.2 A field K is called an ordered field if i is an ordered integral domain 
(Definition 3.1, Chap. 2). 

Thus it is required that the nonzero elements of K be split into two subsets, 
say F and F’, such that F’ consists of the negatives of elements of F and such that 
if a, 6 are in F, then so area + 4 and ab, We continue with the conventions of 
Sec. 3, Chap. 2, for the symbol <. Thusa <b means that b ~ a isin F. 


THEOREM 34 Let a, b, ¢, d be elements of an ordered field K, with b and d positive, 

Then a/b < e/d if and only if ad < be, In particular (taking e = d = 1), a/b <1 
if and only if a <b. 

Proof. By assumption, bd > 0. Therefore a/b <c/d if and only if 

(bd)(a/b) < (bd){e/d), by Theorem 3.6, Chap. 2, and this is just ad < be, 

by Eq. (2.3) above. QE. 


DEFINITION 3.3. Let r be a rational number different from zero, and let it be expressed 

as the quotient r = a/b of two integers a,b. Then r is defined to be positive (r > 0) if 

the integer ab is positive, and + is defined to be negative (r <0) if ab és negative. 

First of all we must show that the definition does not depend upon the particular 

fraction a/b. Suppose that r = a’/b' alao. We have then ab’ = a’b (by Proposi- 

tion 2.2), and so aba’b! = (a’b)?. The right-hand side is positive (Theorem 8.8, 

Chap. 2) and therefore ab and ab’ must have the same sign (Exercise 7(), Sec. 3, 
Chap. 2). 

THEOREM 35 The rational number field @ is an ordered field, Its ordering is com- 

patible with thai of Z. That is, a positive integer is also a positive element of Q. 

Proof. Let P be the set of positive rational numbers (according to 

Definition 8.3), and let P’ be the set of negative rational numbers. Clearly 

P contains J (the set of positive integers), for any element n of J can be 

written as n = n/'l, and then 7/1 is positive by Definition 3.3. P and P’ 

satisfy the requirements of Definition 3.1, Chap. 2, as is easily verified. 

Q.E.D. 


The following theorem exhibits the great structural difference between the 
systems Z and Q, for it shows that between any two rational numbers there is 
always another rational number. 


PROPOSITION 3.6 Le! a and b be two rational numbers, with a <b. Then for the 

rational number ¢ = 14(a +6) we have a<c <b. Furthermore |e — al = 
le — | = 14@ - a). 

Proof. ¢ —a = 14a + gb ~ a = 19 + GY ~ 1a = 16(6 - a). 

By assumption, b —a > 0, and so¢ ~a >, ore >a. Furthermore, 

le ~ al = |!4(6 — a)| = 49(2 — a). The argument for ¢ — b is similar. 

Q.E.D. 


88 Fields, the rational numbers Ch. 3, See. 8 


Taking a = 0, 6 > 0, the number ¢ above is 14), and as the theorem shows, 
0 < 140 <>, From this we conclude at once that the set P of positize rational 
numbers has no least element. 

The following theorem shows that rational numbers are always “comparable 
in size” with integers. + 


THEOREM 3.7 For every rational number r there is @ unique integer q such that 
qar<q+lL 
Proof. Suppose first that r > 0. Then we can write r = m/n, with 
m and » positive. Using the division algorithm, write m = qn +a, 
where 0 <a <n (Sec. 8, Chap. 2). Thenr = m/n = ¢ + a/n, and since 
1 >a/n > 0 (by Theorem 3.4) we have r > q but r <q +1. Hence 
q<r<g+1. The proof forr < 0 is left as an exercise. 9.£.D. 


EXERCISES 


1, If ais an element of an ordered field, prove that a > Oif and only if 1/a > 0. 
2, Let D be an ordered integral domain, Prove that the following three con- 
ditions on elements of D are equivalent: 


ja -bl <e 
b~e<a<dbte 
a-ce<bh<ate 


3. Let a be a positive element of an ordered field. Show that a > 1 if and only 
if 1/a <1. If 6, ¢ are two other positive elements, prove that 


f> ptt tase and FefEt ita ce 


+e b<bHe 


4, Prove that any finite subset of an ordered field contains a least element and a 


greatest element. 
5, Let ¢1, ¢2, . . - , ¢, be positive elements of an ordered field K, and let b, 


. . «5 6, be any elements of K. Suppose that the elements b:/er, b2/¢x, .. . , 
b,/en are not all equal. Prove that 


byt be + +b. 
atet-: +e 
lies between the largest and the smallest of the elements b,/e:. 
6. Let r and 6 be two rational numbers greater than 1. Prove that there is a 
unique integer n > 0 such that b” <r < b"+). 
7. Prove that any ordered field K contains a subfield F which is isomorphic to Q, 
the isomorphism Q — F being compatible with the ordering. (Hint: Consider the 
elements 2-1 of F, where 1 = unit element of F, » being any integer.] 


+ The point here is that one can find examples of ordered fields containing Z as a subring 
and containing elements which are greater than every integer. 


Decimals 89 


8. Prove that @ is a couniable set. [Hint: Consider the enumeration process 
indicated below.] 


(This method of enumeration is called Cantor’s first diagonal process.) 

9. Let a, b, c, d be positive elements of an ordered integral domain, with a <b 
ande <d. Prove that ac < bd. Prove that a" <b" for every positive integer 7. 
Ifa <1, show that a” > a” for any two positive integers with m <2. 

10. Letr bea positive rational number, < 1, and let ¢ bearational number such 
that 0 <c <7 for every positive integer x. Show thate = 0, [Hint: Write ras 
a quotient of integers r = p/¢. Show that g > p and prove by induction that 
@" > p" + n-p*" for all positive integers n. Deduce from this that r? < 
pi(n + p), hence that x <1/e, ife # 0. Then apply Theorem 3.7.] 

11. Supply the proof of uniqueness in Proposition 3.3. 

12. Let ¢ be a positive rational number. Prove that if c’ = (2 +¢)/(1 +e), 
then 2 ~ e"| < [2 ~ |. 


4. Decimals 


In this paragraph we shall show how the familiar idea of decimal expansions is 
related to the foregoing theory. The discussion is very closely related to that of 
Sec. 9, Chap. 2. In order to avoid annoying minus signs, we shall generally con- 
fine our attention to positive rational numbers. 

To fix ideas, consider (for example) the number 253.1402. Wee recall that the 
symbol stands for 

1,4 ,0,2 
253 +75 + G91 + Toe tap 

Of course 253 itself stands for 2-10? + 5-10 + 8. Since we have already dealt 
with 10-adie expansions of integers in Sec. 9, Chap. 2, let us turn our attention 
to the decimal part r = 1402. The first decimal digit, namely 1 in this example, 
tells us that r lies in the range }{q <r < 249. Thesecond decimal digit 4 then tells 
us that r lies in the smaller range 149 + #00 <1 <}{o + 5{o0. Similarly, the 
third digit 0 tells us then that ris in the yet smaller range 179 + ${00 + {000 $ 


90 Fields, the rational numbers, Ch, 8, See. & 


< Mo + #00 + Moo, and so on. Hence the first decimal digit specifies our 
number within }j9; the second digit then specifies it within !{o9, ete. 

The slight difficulty that arises is that some numbers have unending decimal 
expansions, For example, !4 = .388 - - - . Now it would not make sense prima 
facie to write 

1 3 3 3 


16 + in bigs tT 


as we did with .1402 above, because that would indicate an infinite sum, But ina 
field (in particular the field Q) only finite sums are defined. It turns out that 
infinite sums can be defined under certain circumstances; this point will be taken 
up in the following chapter. For present purposes, this question can be avoided. 


DEFINITION 4.1 A decimal is an infinite sequencet of integers ao, a1, Ga, Ay, efe., satis. 
fying the following conditions: 


a 20 and O<a;<10  forj =1,2,8, etc. 


The decimal is said to be a terminating decimal if all the a; are zero from some index 
on—say, a; = 0 for all j 2 N. The decimal is said to be recurrent, or periodic, if 
there are two positive integers N and p such that @;4» = a; for allj > N. 

Observe that if the decimal is recurrent, then av4p = Gx, Qv42p = Oxyp = Gy, 


@nxaap = an+sp = Gy, ete. The same is true for anii+p = Gv4i, @nsi¢2p = 


INelep = 
ay, ete. Therefore, starting with ay, the numbers a, simply repeat over and over 
again in blocks of length p. For example, 75.13421212121 - - - isrecurrent. Here 


we can take N = 4 and p = 2. Note that a terminating decimal is recurrent. 

A decimal is not a number; according to our definition it is simply an unending 
sequence of integers of which all but the first are required to be between 0 and 9. 
In Chap. 4 we shall see that a decimal can be interpreted as a number. 


A decimal, as defined above, will be written ao.aa:a3a; - - - and will sometimes 
be abbreviated by a single letter. For example, A = a.aa0;-:+-,B= 
by.bibobs - + +, and so on. With the decimal A = ap.aaza; + - - , We associate 


the rational number 


mete ye gm 
a An = ao +75 + age + + To 


for any integer » > 0. 
The following proposition shows how much A, of (4.1) can change as n increases. 


PROPOSITION 4.1 and k being positive integers, let nai, Gna2, - - - 5 Gn4x be integers 
between O and 9. Let r be the rational number 


pee $e 


~ oe 7 [oe 


t For the definition of infinite sequence, see Sec. 6, Chap. 2. 


Decimals ot 


Then 
Anat Gani $1 1 
A got S 7S “Town Sige 
Proof. Wehave 10°*!. r = ayyi+ 108! + aay LOR? $e bane 
and so a,41° 10!" < 10" +r < (@,41 + 1) +108, by Proposition 9.2, 
Chap, 2. The assertion follows at once. Q.E.D. 
coronary Let A = ay.aia2a; + + + be a decimal, and let A, be defined by (4.1). 
Then for any positive integer k we have 
1 
43 An S Anse < An + ign 
DEFINITION 4.2 A decimal A = ay-aiaya, - - - is called a decimal expansion of a 
rational number r if 
1 
|. nw Sra, tan 
a An St SA +75 


for all positive integers n, where A, is the rational number defined by (4.1). 

Since we have assumed that ao, a1, a», . . . are all > 0, it follows that A, > 0 
and so r > 0, by (44). Our definition is thus limited to non-negative rational 
numbers. It is easily seen that a decimal A cannot be the decimal expansion of two 
different rational numbers. For the inequality (4.4) can be written 0 ¢r — A, < 
1/10". If A were also @ decimal expansion of say r1, then we would have similarly 
O0<7-A, < 1/10". Hence, by a simple caleulation, |r — 1| < 1/10" for all 
n= 1, 2, 3, ete, and so |r — 1] = 0, or r = ry. For we have 1/10" < 1/n, by 
Proposition 9.1, Corollary, Chap. 2, and so |r — r;| <1/n for all n. Ifr 9 7; this 
would give n < 1/|r — n| for all n, an impossibility, by Theorem 3.7 (see Exercise 
10, See. 3). 

Suppose now that A is a terminating decimal, say a; = 0 for j > N. It follows 
at once that all the numbers A, derived from A by (4.1) are equal for large enough 
x, namely, for n > N — 1. Let their common value be r. It follows easily from 
Theorem 4.1 that A is a decimal expansion of r, in the sense of Definition 4.2. 

If however A is a nonterminating decimal expansion of a rational number 7, then 
the numbers A, never equal 7; but they approximate r with better and better 
accuracy as 2 increases. 

The following theorem shows that every positive rational number has a decimal 
expansion. Moreover it shows how to compute it. 


THEOREM 4.2 Every (positive) rational number ¢ has a decimal expansion, which is 
moreover recurrent. 
Proof. Write ¢ = a/b, where a, 6 are positive integers. We use the 
division algorithm repeatedly (Proposition 8.4, Chap. 2). Namely, divide 
a by 6, getting remainder ro, say. Divide 107. by 6, getting remainder ni. 
Then divide 107, by 4, and so forth. The calculation takes the form 


92 Fields, the rational numbers Ch. 8, See. 4 


a = aod + 7 O<m <b 
10% = ab +h O<n <b 
10r, = aed + re OSn<b 
BS ee 
10r,-1 = Gab + ta Osrn<6 
10rn = Ongrd + taga OS tay <d 
ete. 


These operations produce two sequences of numbera do, a, a2, ete., and 
to, Tt, fx, ete, We show that the former is a decimal expansion of ¢. First 
of all, for the typical element a,, with x > 0, we have a,b = 10-1 — 
tT, S10r,1 < 10b. Hence a, < 10. Clearly all the a, are integers > 0, 
and so A = do.aiaoa; - - - indeed satisfies the definition of a decimal. 
Now from the equation above we get 


a 
e= 55 


b 


2 


tr 
rig? 


an Ta 
io Fiore OS SD 


i+ 


for any positive integer n. Now the number A, of (4.1) for our decimal A 


is just the first part of the expression above. Thatis,c = A, + wo 


Therefore Ay Se <Aq + zarsince r4/b <1, and so A is indeed a decimal 


expansion of ¢ according to Definition 4.2. 

To show that A is recurrent we note that all the remainders r, satisfy 
0 <r, <b, and therefore r, must be one of the integers 0, 1,2... ., 
b — 1. Hence after at most b steps in the computations (4.5) the remainder 
must be equal to one of the earlier remainders, and therefore the computa- 
tions simply begin to repeat. @. 


Exampte1 For c = 11941/4950 the calculations are 11941 = 2- 4950 + 2041, 
20410 = 4- 4950 + 610, 6100 = 1- 4950 + 1150, 11500 = 2- 4950 + 1600, 
16000 = 8- 4950 + 1150. Here we have arrived at a remainder obtained 
earlier, and so now the computations simply repeat. The desired decimal is 
2.41282828 -- + . 


THEOREM 4.3 Let A = a9.aax0; + + + be a recurrent decimal. Then A is a decimal 
expansion of a rational number r. In fact, if aj) = a; for all j > k, then 


Decimals 98 
10° - Arey — Ar 
10r —1 
It is not hard to see that, if the steps (4.5) are carried out, the result is precisely 
the decimal A, save in the case that the a; are all equal to 9 from some point on, 
in which case (4.5) gives a terminating decimal. We omit the details since another 
proof is given in Proposition 2.8, Chap. 4. 


r= 


exampte 2 The decimal 2.41232323 - . - is recurrent, and here we can take 
k = 2,p =2. According to the theorem, 


as in Example 1 above. For the decimal 0.3833 - - - we have k= 0, p=1, 
whence r = (104, — Ao)/9 = (@ — 0)/9 = 1/3. 

It is easily verified directly by referring to Definition 4.2 that the integer 1 has 
the two decimal expansions 1.0000 - - - and 0.9999 ---. The next theorem 
states that decimal expansions are unique except for this type of ambiguity. 


Theorem 4.4 Lei A = ao.a,00; + + + and B = by.bybads - be iwo different deci- 
mal expansions for the same rational number r. Then one of them, say A, must 
terminate: a; = Oforj > N,butay #0. Then for B we must have b; = a; for j = 0, 
1,...,N — Land by = ay — 1, while b, = 9 for j > N. 
Proof, Since A and B are different there must be a smallest j for which 
a; #b; sayj = N. Thenay # by buta; = b,forj < N. Let ussuppose, 
say, that ay > by. For the numbers 


we have by assumption A, <r <A, + 1/10", or 0 <r — An < 1/10", 
and similarly 0 <r — B, < 1/10", for all n =1, 2, 3, etc. It follows 
easily that |A, — B,| < 1/10". But for n = N we have Ay — By = 
(ax — by)/10¥ > 1/10%, since ay > by anda, = b,forj <N. Since also 
Ay — By © 1/10%, from above, there follows Ay = By -+ 1/10”, whence 
ay = by +1, which is one of the things we wished to prove. 

We claim now that By4. < Ay for all k, Indeed 0 < Byyx — By < 
1/10%, by (4.3), and as we have just seen, Ay — By = 1/10". Hence 
Ay — Bway = (Ay — By) — (Bs — By) > 0. From this we conclude 
that Ay For 0 <r — Byge < 1/10°, butr — Byas = (r — Aw) + 
(Ay — By4x) >t — Ay, and 800 <r — Ay < 1/10%** for all k = 1, 2, 
8, ete. Hence r — Ay must be zero. Therefore the decimal A must 
terminate at N, for otherwise we would have Avi, > Ay =f for large 
enough k, which would contradict 0 <r — A, < 1/10" for all n. 


94 Fields, the rational numbers Ch. 3, See. 4 


It remains to show that 6; = 9 for j > N, Suppose then that by,; <9 
for some i. Taking k > | we have 


by by 
By — By = yen to + agen 


bse 
= ipsa cm 


Tove 


fee hE 


All the b’s are < 9, and we have assumed that by; <9. Hence 


1 9 9 
Bos — By S ~Fyyn tage ti + 1 tigen 
Now from Eq. (7.31), Chap. 2, we have 
9 9 9 
torn t °° +y9re = qgra- (Q+tO4--- + 10-5 
= $a 1 ta Mok 1 
~ 10848 9 10" 10” 


Therefore By, — By < 1/10" — 1/10%*! for all k >. On the other 
hand we have from above r — By = 1/10" and r — Brae < 1/1084, 
from which By 4, — By = (t — By) — (F — By4s) > 1/10" — 1/1054, 
Comparing with above, 1/10" — 1/10°* > 1/10% — 1/10%#, or 108+* < 
10°+! for all k, a contradiction. @.£.p. 


To sum up, every non-negative rational number has a decimal expansion, nec- 
essarily recurrent. Conversely, any recurrent decimal is an expansion of some 
rational number. Decimal expansions are unique except in the case of rational 
numbers with terminating decimal expansions. Such numbers have a second 
expansion differing from the terminating one in the manner described in Theorem 
4.4, Observe that for such numbers the process (4.5) always yields the terminating 
expansion because the numbers A, satisfy the sharper inequality A, ¢¢ < A, + 
1/10". To include the case of negative rational numbers we simply allow minus 
signs before decimal expansions. 

In the next chapter we shall see that the rational-number system Q can be en- 
larged to a system R in which nonrecurrent decimals will also be expansions of 
numbers. 


EXERCISES: 
1. Find aif decimal expansions of the following rational numbers: 
M4, 2s * Ma 17, 16, Ma, #9926581, 9718 fa25, Too $4 
2, What rational numbers have the following decimal expansions: 
(a) 25.21412121212 . . . (b) 8.22800300800 « - - 
() 4.19250000 . . - (d) 2.714285714285 - - - 


Verify your answers. 
3. Give a criterion which will enable you to decide in advance whether a given 
fraction a/b has a terminating decimal expansion. 


The binomial theorem 95 


4, Referring to Sec. 9, Chap. 2, discuss the possibility of replacing 10 in the fore- 
going by another integer b. 


5. The binomial theorem 


Many useful formulas obtained from mathematical induction involve fractions, at 
least in appearance. The binomial theorem is a case in point, and that is why we 
have deferred discussion of it until now. ‘The formula involves certain numbers, 
called binomial coefficients, which we now define. 
For any integer n > 0 and for any integer k we denote by the symbol G 9) the 


rational number 
i < 
uw (ofan Bese ss 
ke, . 
otherwise 


Recall that by definition 0! = 1. Hence 

n\ _ fn 7 * 
se (3) = () =} () = G - ) 
From (5.1) it is clear that 


n a 
8 6) = G " ») for all 


Observe that by canceling out either k! or (n — k)! in (5.1) we get 


nw) an Dees FI _ alee - sk ED 
“ @)- mb! Hi 


Osk<n) 


Q-! @-@=+ Q-*Q-2@-: 
()-+)-8@)-3@)-4 
D-.Q)-4Q)-6O-4@)- 


5 
G) = 1,5, 10, 10, 5, 1 
ete. 


The binomial coefficients are connected by many identities. One that we require 
here is 


GLC CE) oeate 


96 Fields, the rational numbers Ch. 8, See. 5 
Proof. Tt k = 0, then (6.5) becomes (",)+(%) = (" 7) whieh 


is correct because (")) = 0 and (S)=(" 31) =1. Fork <0 al 


three terms in (5.2) are zero. 


, «whieh 4 
tke +1, then (5.5) reads (") + +e ‘an which is 


correct, by (5.2), since (7 ‘ ) = 0. Ifk > +1, then all three terms 


in (5.5) are zero. 

We are then left with k in the range 1 < & < m, and for such k all three 
terms of (5.5) are given by the first expression in (6.1). ‘Thus the left-hand 
side of (5.5) now becomes 

a! nt 
E-Die Fi tem — wi 


nt 1 
-G = Tim — Bl * Gant + ) 
_ al n+l 
“ED ee FTTH 
+b _ ‘@ + ) 


“ig aks+0! QED, 


From (5.5) we derive the following important theorem. 
tweorem sa The binomial coefficients @) are integers. 


Proof. We use induction on n. The assertion is clearly true for = 0 
and n = 1. Suppose now that for some nit is true that @) is an integer 
for all k. Then for any k the left-hand side of (5.5) is an integer, and so 


(" ; ‘) is also an integer for all k. Therefore the truth of the assertion 


for » implies its truth for x + 1, and go the assertion holds for all (and 
ky) QED. 


THEOREM 5.2 (Binomial theorem) Let a, » be two elements of a ring A, and suppose 
that ab = ba. Then, for any positive integer n, 


se (a $b aah ©) ont ¢) arb? +(," Jar Lb 
1 2. -1 

Proof. First of all, the right-hand side makes sense because @) is an 

integer, and therefore the expression @ c is defined for any element ¢ in 


A; namely, it denotes the result of adding ¢ to itself G) times (see See. 


The binomial theorem 97 


87 


7F, Chap. 2). If we define a° = 6° = unit element of A, as usual, then 
(5.6) can be written in the more compact form 


(a $d)" = xe) are 


We prove this formula by induction on n, For x = 1 it is true, for then 
(5.7) reduces to (a + 6)! = a +, which is certainly correct. Now sup- 
posing that (5.7) holds for some 7 we shall show that it must hold for 
n +1, hence for all positive integers n, by mathematical induction, But 
first recall that if ab = ba, as assumed, then a*b! = bia" for any positive 
integers k, 1 (see Exercise 1, Sec. 7, Chap. 2). This fact is used in the cal- 
culations below: 
Assuming that (5.7) holds for », we have 


(a+d)- xy (") — 
Pe) {a +b) a" 


by the distributive law. Since (a + bya"—*bt = at tht 4 bar ht = 
aH 4 Qe IBEH, we get 


(a+ byt = y (Pam or + Dy (1) ara 


Now put & = j in the first sum and k 
sults 


ce sort (Aaron (Barry 


fal 


@ +p 


— 1in the second. There re- 


since (" *) = Oitj>n and (,” = Oifj <1, we ean let both sums 


above run from j = 0 to j = » +1, and then they can be combined to 
give 


oro S46" jerry 


- SCs Pare oy (5.5) 
Fa 


This is precisely Eq. (5.7) with » replaced hy 1 +1. Therefore (5.7) 
holds for allz > 0. Q.B.D. 


exampte Applying this to Z, take q = 5 = 1, there results the formula 


58 


»-¥() 


98 Fields, the rational numbers Ch, 8, See. 5 


The binomial coefficients arise frequently in certain types of counting problems. 
‘The following theorem is of great importance in this connection: 


THEoREM 5.3 Let E be a set containing n elements. Then the number of distinct 


subsets of E containing k elements is equal to (*) forl<k<n. 


Proof. We use induction on n. The assertion is clearly true for n = 1. 
Suppose that it holds for all x smaller than some integer m. We show 
that it must necessarily hold for no also. 

Let E be a set containing no elements, and select a fixed element x in E. 
Now it is obvious that the assertion of the theorem holds here for k = 
and for k = no, by (5.2). We therefore take & such that 2 <k < m —1. 
The number of subsets of B containing & elements is equal to the sum of 

A; the number of subsets of elements including x 

B; the number of subsets of & elements not including z 
For A, k — 1 of the elements must be taken from among the no — 1 ele- 
ments different from x. By our induction assumption we have therefore 


A= (j= }). For B, all kof the elements must be taken fromthe ny — 1 
elements different from x. Therefore, from our induction assumption, 
B=(" 71). Hence +B = (2) by 6.5). Wehaveshown that the 


assertion of the theorem must indeed be true for vp. Hence it is true for 
all positive integers x, by mathematical induction. @.E.D. 


eExamece The number of ways of choosing two elements from a set of four is 


4 
equal to @) =6. 


EXERCISES 


LP the identities (7) =(," ana tu 
rove the identities (7 an XG (-1) 


2, Prove that k! divides the product of any & consecutive positive integers. 


. a (PY (P 

8. For any prime p prove that p divides @) , @)... . 6? a 

4, Let p be a prime, and let x,y,z, . . . ,w be any finite set of integers. Prove 
that @+y tet + wy Sart yh tert ees + w (mod p). 

5, Let p be a prime, and let a, 6, q be integers such that 0 <a <bandpig 


0. 


%, 
Prove that m = @) is divisible by p'* but not by p'-°4, [Hint: Using (5.4) 
write m in the form 
pegp’g — D+ ++ hg —&) + + Gg — +) 
pip? —1)--- @e—k--- 1 
and show that p'q — k and p* — k are divisible by exactly the same power of p.] 


The binomial theorem 99 


6. Let a, b,c be elements of an integral domain. Prove that 


nt 


@+bte7 =>) 


a? bo or 


PB 
where the sum is to he taken for all triples p, g, r of nonnegative integers such that 
pto+r=n. (The numbers n!/p!gir! are called trinomial coefficients. Similar 
formulas can be proved for any number of terms a, , ¢, ete.) 


1 1 1 2 

. Prove that — _ =. 

7. Prove at Dg t gg t +e ED a¥i 

8. For any positive integer n prove that n" > 1-8-5 +. - (2m —1). 

1 1 1 1 

. Prove that toy... = . 
§ Prove that 3 +g73 t+ °° + Geo DG ae) “pl 
10. Prove that 1? 4290+ + bat =}jentin eh 


*11, Let K be a finite field of characteristic p (see Exercise 6, Sec. 2). Prove that 
the mapping of K to itself that sends any element z into 2” is an isomorphism. 
12. Let H be a set containing n elements. What is the total number of distinct 
subsets of E? 
13. Let E be a set of n elements, and let F be a set of m elements. How many 
distinct mappings from E to F are there? 


The real-number system 


1, Introduction 


Consider for a moment some decimal A = @p.a.0,2; . . . , a8 defined in Chap. 3. 
With any such decimal we associated a certain sequence of rational numbers 

Ge 
10" 
Now if A happens to be a decimal expansion of a rational number r (that is, if A 
is recurrent), then according to Definition 4.2, Chap. 3, the numbers A, get closer 
and closer to r as » is made larger and larger. The number r can therefore be 
thought of as a “‘limiting value” of the approximations 4,. Now even if A is nota 
recurrent decimal the numbers A, differ from each other very little for large x, 
for as was shown in Chap. 3 (Corollary of Proposition 4.1), 0 < Anus — An < 
1/10" for any positive integers n, k. In other words the numbers A., Ani, Ansa, 
. . . differ from each other by less than 1,'10", and 1/10” can be made as small as 
we wish by taking n large enough. Nonetheless there is no rational number which 
is approximated arbitrarily well by the A,, in the sense of Definition 4.2, Chap. 3, 
because that is true only for recurrent decimals. This suggests that there are 
certain “‘gaps” in the rational-number system Q, and in this chapter we shall see 
how to go about filling them up. 

This problem of gaps can be exhibited in an even more concrete way by another 
example: ¢ being a rational number, consider the equation 2? = ¢. For some num- 
bers ¢ this equation has a solution x in Q, and for some it does not. For example, 
ife = 4 then 2* = 4 has two solutions in Q, namely 2 and —2. But for ¢ = 2, 
the equation has no solution in Q. To prove this, suppose to the contrary that 
there is a rational number x whose square is 2. We can write x = m,/n, where m 
and n are relatively prime integers (Proposition 3.3, Chap. 8). By assumption 
(m/n)? = 2, or m? = 2n?, From this equation we see that if n > 1 then any 
prime factor p of n must divide m? and therefore must also divide m, by Theorem 
8.8, Chap. 2. Hence m and » have the common factor p if » > 1, a contradiction. 


rn = 
mo A, ea tt Tet + (m= 1,2,3,...) 


But if » = 1 our equation is m? = 2, and again we have a contradiction because 
clearly 2 is not the square of any integer, being a prime. Thus there is no rational 
number whose square is 2. 

This negative conclusion need not deter us from trying to find rational numbers 


Introduction 101 


whose squares are very nearly equal to 2. Taking 1 as a first approximation, let 


us try to find a better one among the numbers 1.1, 1.2,..., 1.9. We have 
(1.4)? = 1.96 and (1.5)? = 2.25, and so let us take 1.4 as a second approximation. 
To improve this now try the numbers 1.41, 1.42, ... , 1.49. We find (L41)! = 


1.9881 and (1.42)? = 2.0164, Therefore we take 1.41 as a third approximation. 
Continuing in this way we obtain a sequence of rational numbers 


1, 1.4, 1.41, 1.414, 1.4142, 2... 
whose squares are 


1, 1.96, 1.9881, 1.999396, 1.99996164, . . - 


It is rather obvious, and easy to prove, that by continuing this procedure we can 
obtain rational numbers whose squares are as close to 2 as we may desire. It is 
also clear that the procedure determines a certain decimal 1.414213 - - - . 

These considerations naturally lead to the suggestion that it would be profitable 
to enlarge the rational-number system Q by including all decimals. We shall in 
fact do something very like that. But decimals themselves are very awkward for 
theoretical purposes, as the reader will quickly ascertain by trying to define the 
sum or product of two decimals in such a way as to get again a decimal. More- 
over decimals provide only one rather special method of approximation, and there 
is no point in confining ourselves to it alone. 

We mention briefly another type of approximation. The method leads to a 
rather interesting part of arithmetic, continued fractions. For the sake of brevity 
we limit ourselves to the problem of finding rational numbers whose squares are 
close to 2. For this special case the method is based upon the observation that if 
cis any positive rational number, and if ¢’ = (2 + ¢)/(1 +), then c’”? is closer to 2 
than is ¢? (see Exercise 12, Sec. 3, Chap. 3). Taking ¢ = 1 we get ¢’ = 3. Then 
taking ¢ = 3 we get c’ = {, and so on. In this way we obtain a sequence of 
rational numbers 


1, 34, 28, 1242, 4149, 9940, 39f60 - - » 


and it is not hard to prove that their squares eventually get arbitrarily close to 2. 

The central point of the foregoing, and indeed of the entire chapter, is the notion 
of approximation by rational numbers. The discerning reader may have observed 
a certain element of vagueness in the use of such terms as “near” and “close to” in 
the remarks above, In the following section we shall introduce terminology which 
will make it easy to deal with these concepts precisely. 


EXERCISES 
1. Prove that there is no rational number whose square is 3. Do the same for 
5, 6, and 12. 
2, Prove that no integer is the square of a rational number unless it is the square 
of an integer. 


102 The real-number system Ch. 4, See. 2 


3, Prove that there is no rational number whose cube is 5. 

4, k being a positive integer, prove that if n is an integer and n = r*, where r 
is a rational number, then r must be an integer. What can you say about the 
prime decomposition of n if th t is so? Prove that the integer 99,225,000 cannot 
be the kth power of any intege, for any & > 2. 

5, Let b be a positive integer. Devise a method for determining a decimal 
A = ayaa, - - + such that the rational numbers A, have squares that ap- 
proach b as closely as desired. In other words, show that, once A, is known, 
the digit a, is determined by the condition of being the largest integer such that 
Gy? $2-10*An a a, S (b — Ana?) 10%. 


2. Cauchy sequences and limits 
The following definition is basic for the rest of this chapter. 


DEFINITION 2,1. Let a), @2, G3,» - - 5 Oxy be an infinite sequence of elements in 
an ordered field K. Then it is called a Cauchy sequencet if for every positive element 


¢ in K there exists an integer P (depending in general upon e) such that 


aa lan — aml <e for all m,n > P 


The definition merely says that the elements in the sequence must all be close 
together provided that we go far enough out in the sequence. The point is that 
it gives to the informal notion “close together” a precise meaning that will stand 
up in any court. It might be expected that there is some connection with decimals. 


We prove 
PROPOSITION 2.1 Let A = ao.diaeds - - - be any decimal, and let A, be the rational 
number defined by Eq. (1-1). Then those numbers Ay, As, Ax, . . . form a Cauchy 


sequence in the rational field Q. 

Proof. Let ¢ be any positive rational number. Our problem is to show 
that there is an integer P such that |A, — A,| < e whenever m, n > P. 
This is very easy. For there is certainly a positive integer P > 1/e, by 
Theorem 3.7, Chap. 3. Since 10° > P (Proposition 9.1, corollary, 
Chap. 2) we have then 10° > 1/e, or e > 1/10". Now suppose that m, 
n> P, with say m <n. By the Corollary of Proposition 4.1, Chap. 3, 
we then have 0 < A, — An < 1/10" < 1/10 <e. QED. 


Some further examples of Cauchy sequences are given below. 


example 1,2,3,...,7,.. . is nota Cauchy sequence in Q. 

For clearly ja, — @q| = |n — ml >1if 2 ~ m. If in Definition 2.1 we take 
e = 14, say, then we ean never have |a, — @q| < 14 for all n, m > P, no matter 
how large we take P. 


} After the French mathematician Augustin-Louis Cauchy (1789-1857). 


Cauchy sequences and limits 103 


exameLe 2 For any rational number c the sequence ¢, ¢, ¢,.. . is a Cauchy 
sequence in Q. 
Indeed here we have a, = ¢ for all #, and so ja, — a,,| = 0, whatever m and n. 


Hence (2.1) is certainly satisfied. A Cauchy sequence of this trivial type is called 
a constant sequence. 


exampte 3 The sequence 14, —!4, 14, —}¥, ete., is not a Cauchy sequence in Q. 
Here a, = +14, and |a, — a,| = 1 whenever x — m is odd. Therefore condi- 
tion (2.1) cannot be satisfied if we take, say, e = 99{a0. 


exameces 1, —!4, 14, —1¢, 176, etc., is a Cauchy sequence in Q. 

Here a, = +1/2", and so @, ~ dm = 41/2" + 1/2". Taking, say, n > m we 
have then |a, — a,| < 1/2" + 1/2" < 2/2", Given now any positive rational 
number e there is an integer P > 1/e. Since 2’ > P there follows e > 1/2", and 
80 la, — an| <eif m,n > P. 


exampLe s 34, 34,76, ... , (2" — 1)/2", . . . ig a Cauchy sequence in Q. 
The proof is similar to that of Example 4. 


Returning again to Definition 2.1, we point out that the condition stated must 
be fulfilled for every positive ¢, no matter how small. Henee by taking smaller 
and smaller values of ¢ we see that (2.1) requires the elements of the sequence to 
be closer and closer together (provided that m and n are large enough). 

We stress the fact that there is no general procedure for determining whether a 
given sequence is a Cauchy sequence or not. Each case must be studied individu- 
ally, and frequently some rather ingenious tricks are required, as in the case of 
the following two sequences, which are discussed in any standard calculus text: 


1 
Lithitgt pete and Litgeltptagete 


The first of these is not a Cauchy sequence (in Q), but the second is. 


PROPOSITION 2.2 If 41, a2, 43, . . . is a Cauchy sequence in an ordered field K, then 
so is any sequence obtained from it by omitting any elements whatever (provided that 
what remains is still an infinite sequence) or by inserting any finite number of new 
elements. 

The proof is very trivial and is omitted. 

The following proposition shows that the terms in a Cauchy sequence cannot 
become arbitrarily large. 


proposition 2.3 Le! a1, dz, dz, . . . be a Cauchy sequence in an ordered field K. 
Then there exists a positive element ¢ in K suck that |a,| < ¢ for all n. 

Proof. Let ¢ be any fixed positive element of K. By definition there 

is an integer P such that |a, — aml <¢ for all x, m > P. There follows 

in particular la, — ar4il < efor alln > P. Hence ‘a, <¢ + laryil for 


104 The real-number system Ch. §, See. 2 


all 2 > P. Now put ¢e =e + lal +--- + larl + lapyl. Clearly 
ja,| <efor all x QED. 
PROPOSITION 24 If G1, G2, ds, . . . and by, be, by, . . . are Cauchy sequences in an 
ordered field K, then so are the sequences whose nth ierms are, respectively, @, + Dn, 
Gn — dn, Gad,. Furthermore, if k is a positive integer, then ay*, ast, ast, .. . is a 


Cauchy sequence; and if there is a positive element by such that |b,| > bo for alln = 1, 
2, 3, ete., then 1/b, 1/bs, 1/bs, . . . is a Cauchy sequence. 

Proof. We prove the assertion only for the sequence ab, dabs, abs, 

. . . , leaving the others as exercises. We start with the trivial identity 


OnBn — Ondm = (ne — Om) + Ba + (Bn — Dm) + Om 
From the triangle inequality (Exereise 6, Sec. 3, Chap. 2), we get 
lanbr — GnOnl < lax — Gal [Bal + [be — Bal > [eal 


From Proposition 2.8 there exist fixed positive elements c, ¢2 in K such 
that |a,| <¢ and |b,! <¢2 for all n, Let ¢ be the larger of , co. Then 
we have 


lanBx — Ondm| S C+ [ax — aml + 6+ [bn — Bul 


Now let ¢ be any positive element in K, and put e’ = e/2e. By Defini- 
tion 2.1 there is an integer P, such that |a, — a,| <e’ for all n,m > Py, 
and there is similarly an integer P2 such that |b, — 4,| <e’ for all », 
m > P,, Let P be the greater of P:, Pz, Then for all n,m > P we have 
la, — @,| < e’ and |b, — ,| < e’, and so 


lanbn — Gnbm| < e+e’ + ce’ = 2ee’ =e QED. 
Therefore a6), aobo, . . . is a Cauchy sequence. In particular, a,”, a,%, a;*, . . - 
is a Cauchy sequence, and by a simple induction one finds that ai*, ao", .. . isa 


Cauchy sequence for any positive integer k. 


DEFINITION 2.2 Let a), a2, a3, . . . be any sequence in an ordered field K, and let a 
be an element in K. Then the sequence is said to have a as a limit, or, in symbols, 
lim a, = a, if for every positive element e in K there exists an integer P (depending 


in general upon e) such that 
22 la, — al <e for alln > P 


If this is true, then the sequence is also said to converge to a. 
The definition merely expresses in precise terms the intuitive notion that the 
elements a, get arbitrarily close to @ if n is taken large enough. For example, if 
our field is @, then we can take (say) ¢ = 10-®. The definition then says that « 
and a, must differ by less than a millionth for large enough x. Replacing 10-* by 
10-¥, the definition says than @ and a, must differ by less than 10- for large 


Cauchy sequences and limits 105 


enough x, In general, as ¢ is made smaller the integer P will have to be made 
larger. 


PROPOSITION 2.5 A sequence di, G2, ds, - in K cannot have more than one limit. 
Proof. Suppose that a and a’ are two different. limits of the sequence. 
Then |a — a’| > 0, and by definition there is an integer P; such that 
ja, — al < la —a’|/2 for all n > Py.t Similarly, there is an integer P> 
such that ja, — a’ < la — a’|/2 forall n > P:, But we have ja — a’| = 
I(@ — ax) + (@n — @’)| < la — anl + la’ — a,|, by the triangle inequality, 

and so ja — a’| < la — a’|/2 + la — @’|/2 = |a — a’|, a contradiction. 
Q.E.D. 


PROPOSITION 2.6 If a sequence a), 2, ds, .. . in K has a limit a, then it must be a 
Cauchy sequence. 
Proof. Let e be a positive element in K, and let P be an integer such 
that |a, — al <e/2foralln > P. Then |a, —a,| =e, -a+a—a,| < 
la, — al + la — a,| <e/2 +¢/2 =eforallmxn> P. QED. 


The converse of this theorem is not true in general. That is, a Cauchy sequence 
may very well not have a limit. 


proposition 2.7 Lei ¢ be a rational number with |c| <1. Then the numbers 1, 

L+elteteilte +e? tect ete, form a Cauchy sequence in @ and it has as 
a limit the number 1/(L — ©). 

Proof. Puta, =1+¢+e%+--- +e", Our problem is to show 

that lim a, = 1/(1 — ec). Now (1 —c)-a, =1—c", by the identity 


proved in Chap. 2, (7.31), and 30 


By Exercise 10, See. 3, Chap. 3, there is an integer P such that |el" < 
e: (1 —¢) for all > P, and so 


lel” 


las - zc <e  foralln > P QED. 


PROPOSITION 2.8 Let A = ap.aia2a; - - - be a decimal. Then the sequence of num- 
bers Al, Aa, Az, . . . defined by (1.1) has a limit r in Q if and only if the decimal is 
recurrent, and then A is a decimal expansion of r. 
Proof. Suppose that the sequence Aj, As, As, .. . has a Hmit rin @. 
Then A, <r for all », because if, say, Az > r, then for n > k we have 
A, > Arandso A, — r > A; — r, showing that Definition 2.2 is violated. 
Now fix some m and let ¢ be any positive rational number. Since 
+ Here 2 stands for 1 + 1, where 1 is the unit element of K. 


106 


The real-number system Ch. 4, See, 2 


lim A, = 1, it follows by definition that |r — A,| < e for all sufficiently 


Jarge », say for all 2 greater than some integer P. We have r — An 
=r—A, +A, — Am and so 7 — An S¢+A, — An provided that 
n> P. But if » > m, as we may assume, then 0 < A, — dn < 1/10", 
by the corollary of Proposition 4.1, Chap. 3. Recalling from above that 
An <1, we now have 


O<Sr-A, < ao te 


ft 


Then r — A, — 1/10" <e. Since e was an arbitrary positive number it 
follows that r — A, — 1/10" <0, Hence we have finallyO <r —An < 
1/107 for all m, showing that A is indeed a decimal expansion of r, ac- 
eording to Definition 4.2, Chap. 3. Therefore A must be a recurrent 
decimal, by the results of See. 4, Chap. 3. 

To prove the converse, let A be a recurrent decimal, say with a;,p = a; 
for all j > k, P therefore being the period of recurrence. Consider the 


sequence b, = Acgr, bs = Anger,» - yb, = Acar)... Then 
Gp nP Hs Os cntnP 
Bast — bn = goer +t + Tore 


Since ar4.P41 = Gray ete., there follows 


1 Oar ner 
Bat — Bn = gauge + IRN) = yee Arne — Ad 
Now brat = db + (bz — by) + (bs — bo) + +++ + nar — 5,), and so by 
the formula just established we have 
A 1 1 
= b+ (Aur — Ay) 10 +10 * tee + ape 
oe be ( yn 
Putting be = qt 10" = 1+ 75 a tees + (ape we see 
that the sequence bi, bj, bi, . . . has a limit, namely, 10°/(10" — 1), by 
Proposition 2.7. Since b, = 6; + (Axyr — Ax)/10” +b, it is easy to see 
that the sequence ,, be, bs, . . . also has a limit r, namely, 
. ; Auyp — Ax 10” 
r= Tim b, = Tim Avgae = bi + SP "Tey 


Since b, = Arsp, we have 


(ef. Theorem 4.3, Chap. 3). This shows that the sequence Aisr, Ansor, 
Area, ete., has the limit r. It follows easily from the fact that Ai < 


Cauchy sequences and limits 107 


Ao < As < ++ + that the original sequence A), A», As, . . . also has the 
limit r. By what was shown above, A must be a decimal expansion of r. 
QED. 


The main idea of this section was to formulate a concept, namely, Cauchy se- 
quences, which includes the notion of decimals (in the sense of Theorem 2.1) and 
is free from the cumbersome dependence on powers of !jo. For that reason they 
are far easier to deal with than decimals. Another advantage of Cauchy sequences 
is that they can be defined in systems in which decimals cannot be defined. As 
Proposition 2.8 shows, only certain Cauchy sequences in Q have limits, and in the 
next section we shall see that Q can be embedded in a larger ordered field R in 
which all Cauchy sequences have limits. 


EXERCISES 
1, Complete the proof of Proposition 2.4. 
2. Let a, a2, a3, . . . be a Cauchy sequence in an ordered field, and let it have 


alimit a. Prove that any sequence obtained from the given one, either by omitting 
any terms whatever (but in such a way that what remains is still an infinite 
sequence), or by inserting any finite number of new terms, has the same limit a. 
3. Let a, a2, a3, . . . and 6, by, ba, . . . be sequences in an ordered field having 

limits a, 8, respectively. Prove the following: 

{a) If a, <6, for all but possibly a finite number of x, thena <b. (Give 

an example where equality holds.) 
(@) a+5 =lim(a@, + ,). 
(c) ab = lima,b,. 


(d) If b #0, then a/b = lim a,/, (Show first that only a finite number of 


the 8, can be zero.) 
(e) For any positive integer k, ab = lim a,besiv 


ak, 


(f) For any positive integer k, lim a,! 


(g) For any ¢, ey in K, a + eb = lim (ca, + ¢ob,). 


4, Prove that lim x reac 1. That is, prove that if a, = 7 + 
ogee. +), then lima, = 1. [Hint: Consider th i 
23 ho te + D + iim a, = ints onsider @ expression. 
1 
EET ft a 


5. Let r be a rational number with r < 1, and let a be any rational number. 
Prove that 


lim D> art 


awe £ot 


108 The real-number system Ch. &, See. 8 


6. Let a, a, a3, . . . be a sequence of positive elements in an ordered field, 
and suppose that a > a, > a; >, ete. Prove that the sequence whose nth term 
is 8, = @, —@) +43 — + -- - + a, (alternating signs) is a Cauchy sequence. 


8. The field of real numbers 


Here we deal with the field of real numbers in much the same way we took up the 
rational-number system in Chap. 8, namely, by specifying its properties. 


DEFINITION 3.1 An ordered field K is said to be complete if every Cauchy sequence in 
K has a limit in K. 


DEFINITION 3.2. A field R is called a field of real numbers #f it is a complete ordered 
field containing the rational-number system Q as a subfield and if furthermore every 
element of R is the limit of some sequence of elemenis in Q. 


REMARK. A propos the requirement that R contain Q, one should realize that 
any ordered field contains a subfield isomorphic to Q. Cf. Exercise 7, Sec. 3, 
Chap. 3. 


The intuitive idea of this last requirement is that every element of R ean be 
approximated arbitrarily wel! by elements in @. Of course the very notion of 
approximation here depends upon the notion of ordering of a field. Let us first 
observe that the ordering of R must necessarily be compatible with that already 
defined in Q. Recall that the square of a nonzero element in an ordered integral 
domain is always positive. Hence for the unit element 1 of 2, which must be the 
same as the unit element of Q, namely, the integer 1, we have 1° = 1; hence 1 > 0 
in R. By the ordering axioms, if a > 0 and b > Qin R, then also a + 6 > 0 and 
ab > 0, In particular, 2 = 1+ 1 > 0inR. By a simple induction one finds that 
every positive integer # is also a positive element of R. Now let r = m/n be a 
positive element of Q. By definition, m- must be a positive integer; hence also 
a positive element of R. Since 1/n? = 1/n- 1/n is positive in R, it follows that 
(1/n?) mn = ris positive in R. Hence positive elements of @ are also positive in R. 

The following theorem, analogous to Theorem 3.7, Chap. 3, shows that elements 
of R are comparable in size to elements of Z, That is, there are no elements in R 
which are greater than all elements of Z. 


THEOREM 3.1 Let c be an clement of a field of real numbers R. Then there is a unique 
integer q such thatq Se <q +l. 

Proof. By Definition 3.2 the element ¢ is the limit of some sequence 
ey ey ex, ... in @. Then, by Definition 2.2, there is an integer P such 
that len — cl < Lforalln > P. Fixing such ac, (itis a rational number), 
there is an integer q' such that q’ < c, <q’ + 1, by Theorem 3.7, Chap. 3. 
Since c, and ¢ differ by less than 1, we have g’ ~1 <c <q/ +2. The 
assertion follows easily from this. @.E.D. 


The field of real numbers 109 


coroLtary 1 Ife is any positive element in R, then there is a positive integer n such 
that lin <e 
Proof. There is an integer n > 1/e by the theorem above. 


Thus for any positive element of R there is always a positive rational number 
which is smaller. 


corottary 2 If ¢ is any element of R and if ¢ is any positive element of R, then 
there ave rational numbers a, b such thata <c <bandc—a<eb-—e<e. 

Proof. Let e' be a rational number with 0 <e’ < ge. Now by Defi- 

nition 3.2 we have ¢ = lim ¢,, where cy, ¢, ¢, -.. i8 a sequence of 


rational numbers. By Definition 2.2 there is an integer P such that 
lee —e| <e' forall n > P. Fixing such ac, simply put a =c, —e’ and 
b=c, +e’. QED. 


COROLLARY 3 A sequence G;, a2, G3, . . . of rational numbers is a Cauchy sequence 
in Rif and only if it is a Cauchy sequence in Q. 
Proof. Let a, a:, ay, . . . be a Cauchy sequence in Q, and let ¢ be 


any positive element in R. Let e' be a positive rational number with 
e’ <e, Such an e’ exists by Corollary 1 above. By Definition 2.1, there 
exists an integer P such that |a, — a, | < ¢’ for all x, m > P, Therefore 
la, — @,| <e for all n, m > P, and consequently the sequence satisfies 
the requirements for a Cauchy sequence in A. The converse is trivial. 

QED. 


The following theorem shows that there can be essentially just one field of real 
numbers. 


THEOREM 3.2 If R and R’ are two fields of real numbers, then there is one and only 
one isomorphism f from R to R’, and f must send every element of Q into itself. 

Proof. By Definition 2.3, Chap. 2, f is required to satisfy the conditions 

f(a +) = f(a) + 1) and flab) = f(a) - f(b). Put b = 1 in the latter, 

obtaining f(a) = (a) - f(1). This shows that if f(1) = 0, then f must map 

every element of R into 0. We assume then that f(1) #0. Puttinga = 1 

in the equation above gives f(1) = f(1) - f(1), whenee f(1) -[1 — f(L)] = 0 

It follows that f(1) = 1, Then f(2 +1) = f(1) +f.) = 1 +1; that is, 

f(2) = 2. By a simple induction we find that f(n) = n for every positive 

integer x. From (2.1), Chap. 2, we have (0) = 0andf(—a) = —f(a). In 


particular, f(—») = ~f(n) = ~n for any positive integer x. This shows 
that f must map every element of Z into itself. Further, 1 = (1) = 
finn) = f(n) + fan) = n- f(n-) for any positive integer n, and there- 


fore fin“) =n, Hence f(m/n) = f(m-n-) = fim). f(nt) = men = 
m/n. We conclude that f must map every element of Q into itself. 
To complete the proof we must borrow a result proved below (Theorem 


110 The real-number system Ch. 4, See. 3 


4.3). It is entirely independent of the matter at hand. Namely, Theorem 
4.3 shows that if z is a positive element of R, then there is an element y of 
R such that y? =x. Then f(x) = f(y") = fy-¥) = f(y) - f(y), and so 
f(z) 2 0, by Theorem 3.8, Chap. 2. Take a rational number r such that 
0<1r<s. By what we have just shown, f(x — 1) > 0, since « — 7 is 
positive. But f(x — 1) = f(z) — f(r) = f(a) — r (by the result proved 
above). Thus f(x) — +r > 0, or f(x) > > 0. Therefore f(z) must be 
positive. Now if 6 >a in R, then b — a > 0; by the result just estab- 
fished, f(b — a) > 0. But f(@ — a) = f(b) — f(a), and we conclude that 
f(b) > f(a), so that f must preserve ordering. 

Now let e1, ¢2, ¢3, . . . be a Cauehy sequence of rational numbers. It is 
then a Cauehy sequence in both & and R’, by Corollary 8 above. It there- 
fore has a limit cin F and a timit ¢’ in R’, by Definition 3.2. We claim that 
S{c) = c'. For let ¢ be a positive rational number. Then by definition of 
limit there is an integer P such that |¢ — ¢,| <e for alln > P. Since f 
preserves order, it follows that |f(e) — fle] < fle). Sinee ¢,, € are ra- 
tional, we have f{e,) = c, and f(e) =e. Therefore |f(c) —¢,| < e for 
n> P. Consequently f(c) is the limit of the sequence 1, ¢, ¢3, . . . in 
R’, and so f(c) = c', by Theorem 2.5. It follows easily that f is one-to-one, 
although we merely assumed that f is a ring-homomorphism such that not 
every element of 2 is mapped into zero. 

Now given # and R’, this analysis shows us how to go about construct- 
ing a mapping f satisfying the requirements. Namely, we start by de- 
fining f(a) = @ for any rational number a. Then for any element ¢ in R 
we select a Cauchy sequence ¢1, ¢, ¢ . . . in Q having c as a limit (one 
exists, by Definition 3.2). The same sequence must have a limit ¢’ in F’, 
and we simply define f(c) = ¢’. It is easily verified that ¢’ depends only 
on ¢ and not on the particular Cauchy sequence selected, and therefore f 
is uniquely defined for all elements of R. It is a straightforward matter to 
verify that f is an isomorphism. .£.D. 


Because of the essential uniqueness of a field of real numbers, as shown by the 
preceding theorem, we now suppose one such field is fixed once for all, and we 
denote it by R. Its elements are called real numbers. A real number is called 
rational if it is in the subfield Q, otherwise irrational. 

The existence of such a field R is proved in Chap. 15. The idea of the eonstruc- 
tion is as follows: One simply takes the set S of all Cauchy sequences in Q, defin- 
ing any two of them, say aj, az, a, . . . and aj, a3, a3, . . . , to be equivalent if 
lim (a, — 4) = 0. This relation of equivalence partitions S into equivalence 


classes, and these classes are taken to be the elements of R. It is easy to define 
appropriate operations of addition and multiplication in the new set R. 


Some properties of R ii 
EXERCISES 
1. Let a = lim a, and let & = lim 6,. Prove that a =6 if and only if 


lim (a, — b,) = 0. Use this fact in both R and R’ to show that the mapping f 


defined in the proof of Theorem 3.2 is well defined and one-to-one. 

2. Prove that f of Theorem 3.2 is an isomorphism. 

3. Let @, 6 be two real numbers, with a <6. Prove that there is a rational 
number ¢ such that a <¢ <b. 

4, Let L be a complete ordered field containing Q as a subfield and such that for 
any element ¢ of L there is an integer n > e. Prove that Lisa field of real num- 
bers. (Hint: Show that fore > 0 in L there is a rational number @ such that 
ja — | <e. By taking smaller and smaller values of e, for example 1/2, conclude 
that there is a sequence in Q having c as limit.] 


4. Some properties of R 


Consider the set S of all rational numbers r such that r? < 2. Since, for example, 
(3/2)? > 2, it follows that all elements of S are less than 8/2. As we have seen in 
Sec. 1, there is no element in S whose square is 2, and S has no greatest element. 
We shall now show that this kind of situation does not occur in the field of real 
numbers R. 


DEFINITION 4.1 Let S be any set of real numbers. Then a real number b is called an 
upper bound of Sif b > a for every element a in S, A real number c is called the least 
upper bound (l.x.b.) of 8 if it is an upper bound and if c < b for every upper bound b 
of S. [Lower bounds and the greatest lower bound (g.l.b.) are defined in an analogous 
fashion.| 

Observe that there cannot be two least upper bounds of S. For if ¢, ¢’ are both 
Lu.b. of S, then by definition ¢ < c’ and similarly e’ < ¢, whence c =e’. 


ExampLe 1 Let S be the set of all real numbers r such that 0 <r <1, Then 1 
is an upper bound for S; so is 150,322,121.52. It is easily seen that 1 is the l.u.b. of 
S. Similarly —4000 is a lower bound for S; so is 0, and in fact 0 is the g.l.b. of S. 
Thus this S happens to contain its g.1.b. but not its l.u.b, 


Exampte 2 Let S be the set consisting of all positive integers n and their recipro- 
cals 1/n. Then S does not have an upper bound, But any number < 0 is a lower 
bound for S, and 0 is easily seen to be its g.l.b. It is not in S. 


THEOREM 4.1 Let S be any non-empty set of real numbers, If S kas an upper bound, 
then it must have a least upper bound. 

Proof. Let « be an element of S, and let ® be an upper bound for S. 

Thus by definition a <6, By Theorem 3.1 there exist integers M and N 

such that M <aand <N. Now foreach k = 0,1,2, . . . the integers 

n for which n/10* is not an upper bound of S form a non-empty set T (be 


112 


an 


a2 


The real-number system Ch, 4, Sec. & 


cause 10°. M is in T,) which contains no integers greater than 104N. 
Therefore (by Exercise 6, Sec. 4, Chap. 2) T; has a greatest element nm. 
Thus ny is the largest integer such that 2«/10* is not an upper bound of S. 

We now show that the sequence mo, 1/10, m/10?, . . . is a Cauchy 
sequence whose limit ¢ is the least upper bound of S. 

First of all, 2,/10* = (10*- ,)/10**4, and so by definition of n4, we 
have 10. 2, < ma. But by definition of m, the number x, + 1/10* must 
be an upper bound of S, while m4./10*** is not, and so m/l < 
(m + 1)/10%. Putting { = & + h we have then all together 


teem om +l 
1o* * 10! * 108 


for every! > k 


Therefore, if ! > k, 


mom 2 1 
9S igi — jot < Gor 
Now let e be any positive real number. Then there exists an integer P such 


that 1/P <e (by Corollary 1, Theorem 3.1). Then 1/10? < 1/P <e, 
and so there follows from (4.2) 


Rr iis 
To! ~ To <° for all k, 1 > P 
Thus the sequence mp, 7.10, mo/10?, . . . is indeed a Cauchy sequence in 


R. Since R is complete, it has a limit ¢ = lim 7,10", Since the numbers 


koe 


n./10* increase with k, by (4.1), it follows that 
sae se forallk =0,1,2,... 


Suppose now that § contains an element @ >e. Then we have also 
@ > ng/10* for all k, from just above. We show that this is impossible. 
For if a —¢ > 0, then there is an integer m such that 1/m <a —c, 
whence 1/10" < a —¢ (Corollary 1, Theorem 3.1). Then adding the in- 
equalities m_/10" <c and 1/10" <a—e we get (tm +1)/10" <a, 
showing that (7m + 1)/10" is not an upper bound for S and therefore con- 
tradicting the definition of mq. Hence we must have a <e for every 
element of S, and go ¢ ig an upper bound for S. 

Suppose now that ¢’ is also an upper bound for S, with e’ <e. Since 
7/10" is not an upper bound for S, there is an element of S which is 
greater than 7:/10, But it must be < e’, and so we have 


Nk , 
To << 


Some properties of R 113 


But then ¢ — ,/L0* > ¢’ —¢ for all k, contradicting the fact that ¢ = 
lim »,./10*. Hence we must havee < c’, and therefore c is the l.u.b. of S. 


no ® 


QED. 


The existence of a least upper bound of any subset S of R which has an upper 
bound is actually a characteristic property of R, in the sense that it could be used 
in place of completeness in Definition 3.2. See Exercise 5 below, 

If we examine the foregoing proof carefully, it becomes evident that we have in 
effect produced a decimal expansion of c. For let us set 


% = My =m — Wry... ay = MH — WOM. 


Then we have 


a a 
aot apts tage 
_ my — 10m | ne — 10m tty — 10M) _ Re 
am tO tag tt pe =e 


Furthermore, putting 1 =k +1 in (4.1) we get 10m, < m4 < 10m, +10, or 
0 < my — 10m < 10. Therefore 0< ais $9 for k > 0, or OS a <9 for 
k>1 

Now let us assume that ¢ > 0. From (4.2) it follows readily that 


fork = 0,12... 


ny 
4a 0 Seip 


For otherwise we would have ¢ — m%/10¢ = 1/10 + y, where y > 0, But from 
(4.2) 


this for all 1 >. But then 1/10! + y <c¢ — m/10! + 1/10*, whence ¢ — 
m/10' > yfor alll > k, This contradicts the fact that ¢ = im 74/10! and there- 


fore proves (4.3). 

Putting k = 0 in (4.8) we get 0 Se — 1m <1. Since we have assumed ¢ > 0 
and since np is an integer, it follows that wm =a) >0. Hence the numbers 
a, @%, a, . . . defined above determine a decimal A = aa.aara; + --, and for 
this decimal we have from above 


= My 4 
Av ao tag t+ +708 7 Got 


Then by definition of ¢ we have c 


0 Se — A, $1/10%. Hence if we carry over the definition of decimal expansion 


114 The real-number system Ch. 4, See. 4 


(Definition 4.2, Chap. 8) to the real-number system R, then we have proved that 
A is a decimal expansion of the positive real number ¢. 

Now given any positive real number ¢, it is obviously the |.u.b, of the set S con- 
isting of c alone, and therefore the foregoing argument applies to anyc > 0. Hence 
we have the following theorem: 


THEOREM 42 Every positive real number c has @ decimal expansion A = 


Go.0y2243 - > +. That is, for the rational numbers 
= My. Be 
Au = % +79 + + ig 


we have 0 Se — Ay < 1/108 for k = 1, 2,3, ete. Furthermore ¢ = lim Ay. 
ine 


It follows conversely that every decimal A is the expansion of some real number. 
For by Proposition 2.1 the numbers Ai, Az, As, - . . form a Cauchy sequence, and 
it must have a limit cin R. The proof of Theorem 4.4, Chap. 3, holds good in R 
as is easily seen, and it shows that decimal expansions are unique except in the case 
of terminating decimals. Since only rational numbers have terminating decimals, 
it follows that the decimal expansion of any irrational number is unique. 


corottany The set of real numbers R is uncountable. 

Proof. We show in fact that the set J of real numbers « such that 

0 <a <1 is uncountable. Suppose to the contrary that the elements of 
J ean be enumerated in some order, say a, b, ¢, d, etc. Represent these 
numbers by their decimal expansions A, B, C, D, ete. Now select a 
decimal X = Q.x,2%; - - - as follows: Choose z, to be different from the 
first digit a, in A; choose 2» to be different from the second digit b, of B; 
then choose z; to be different from the third digit ¢; of C; and so forth. 
This can certainly be done in such a way that X is not recurrent. Now X 
must represent some real number x in the set J. But X differs in at least 
one place from every one of the decimals A, B, C, D, ete., and therefore 
the number x cannot appear in the list a, 6, c, d, . . . , a contradiction. 
QED. 


We now return to the question raised in Sec. 1, the existence of roots. 


THEOREM 4.3 Let r be a positive real number, and let n be a positive integer. Then 
there is a unique positive real number x such that x” =r. 
Proof. We first observe that it suffices to prove this for r <1. For if 
r > 1, let & be an integer such than & > 7. Then also k" > r, and so 
r’ = r/k* is less than 1. If now x" = 7’, then for x = kx’ we have x = 
kar, 
Then, assuming r <1, let S be the set of all numbers y such that 
y" <r. Clearly 1 is an upper bound for S, and therefore S has a least 
upper bound x, We show that 2" =r. 


Some properties of R 115 


To do so let a be a number such that |aj <1. From the binomial 
theorem we have 


(+a) =x + ne at +++ + near +a" 
=a" bas (nel + es + neat? + atl) 


Since a] < 1 it is clear that the expression in parentheses is smaller than 
what is obtained by replacing a by 1, namely 


cant pee. bret 

Therefore we have 

|@ +a)" — 2] <e- jal 

If now a <1, take a = (r —2")/e. Since r <1, ¢ > 1 it follows that 


a <1, and we have 


=r-2 


O<@+ay"—2" <e- 


and so (t +a)" <r. Therefore x +¢@ = x2 + (r — 2")/c is in the set S 
and is bigger than x, a contradiction. Hence 2” < r is impossible. 

If 2” > r, take again a = (r — 2")/e. Since 1 is an upper bound for S, 
wehavez <l,andso-1 <a<0 Thenx +a <4, andso (x +a)" < 
x", Hence, from above, j(x +a)" — 2"| = 2" —(@ +a)" <c-la] = 
e-2" —rfe = x" —r, whence —(r + a)" < —r,orr < (x + a)". There- 
forex +a =z 4+ (r — 2")/c is an upper bound for S but is smaller than 
x, a contradiction. Hence x" > r is also impossible, and so x" =r. If 
O<a' <2, then e™ <x =r Tf O< ae <2’, then r =a" < xl 
Hence z is the only positive number such that 2” =r. Q.E.D. 


For any positive r the unique positive number 2 such that x” = 7 is usually de- 
noted by r'’", or by V7. We define 


penn (pinye 
for any integer m, Then 
(rminyt ae (payee a (enim (reyes = om 


and sor" = (r™)""", by uniqueness of positive wth roots. If m/n = a/b (a, b 
integers, with 6 > 0), then mb = na. Therefore (ro) = (rym = (p'yom = pm, 
showing that r*® = r*', again by uniqueness. Hence the symbol r is defined for 
any rational number c. It is easy to verify that the usual rules for exponents 
continue to hold for rational exponents. 


We conclude this chapter with a brief mention of infinite series. Let co, ¢1, ¢2, Ca 
ete., be real numbers. Our problem is to attach some meaning to an “infinite sum” 


tate tet: : 


116 The real-number system Ch. 4, See. & 


or more compactly, 


uw Sa 


This is easily done in many cases as follows: Consider the sequence 
89 = Co 81 = Co + 1, 82 = Co + 1 + es, 8s = Co + 1 + Ce + ts, ete. 


These are called partial sums of the infinite series (4.4). 


DEFINITION 4.2 If the sequence of partial sums 8, 81, 82, . . . is a Cauchy sequence, 
then the series (4.4) is said to converge, and its sum is defined to be lim s,. 


Then we can write 


Derinyesimetat- ++ +6) 


provided the conditions are fulfilled. 


ExampLe 3 Let « be a real number such that —1 <x < 1, and for & above take 
a, Then 


Se slim te taht ba) = jim = 
noe jor I-¢% 

k=O 

The proof of this is the same as that of Proposition 2.7, save that the restriction 

to rational numbers is no longer necessary. The series does not converge if |z| > 1. 

The series is called the geometric series. 


Exampte 4 Let A = ap.aia,0; - + - be a decimal. Take c, = a,/10* in (4.4). 
The partial sum co +c. + » - > ¢, is what we have called A,. Hence if r is the 
real number represented by the decimal A, then we have (Theorem 4.2) 


Daige = Jima =r 


ExameLe s For any real number « the series 
ra 
4k 
converges. The sum of the series is usually denoted by e’. The operation x — e* 
gives a mapping from R to itself, and it is called the exponential function. Taking 


x = 1, the number e’, or simply ¢, plays an important role in mathematics. An 
approximate value for ¢ is 2.71828. 


examece ¢ The two series 


Some properties of R uy 


(21) kr tg a xs 
SSgopectcmtac te 
converge for any real number x. The sum of the first one is cos x, and the sum 
of the second is sin x. 

It is not difficult to prove that convergent infinite series can be manipulated in 
much the same way as ordinary finite sums. These matters are treated in any 
standard calculus text. 


EXERCISES 

1. Let r be a positive real number, and let a, ) be rational numbers. Prove 
that r¢+> = ret? and (r)b = 1%, 

2. Let a be a positive rational number. Show that if 0 <r <s, thenr? < s*, 

3. Let a, 6 be rational numbers with 0 <a <b. Show that r¢ <Pifr > 1 
and r¢ > rif <1. 

4. Let cy, ¢2, es, .. . be a Cauchy sequence of rational numbers; let r be a 
positive real number. Show that rl, r, rs, .. . is a Cauchy sequence. 

*5. Let L be an ordered field containing @ as a subfield. Suppose that every 
non-empty subset of L which has an upper bound must have a least upper bound. 
Furthermore, suppose that L is Archimedean; i.e, that for any element c of L 
there is an integer » > ¢. Prove that L is a field of real numbers. 

6. Let a < a2 <a; < + be a sequence of real numbers, and suppose that 
there is a number b such that a, <6 for all x. Show that a, a, a3... is a 
Cauchy sequence. 

7, Let S be a subset of R having a lower bound. Prove that S has a greatest 
ower bound, 

8. Let m be a positive integer, and let Q(vm) denote the set of all numbers 
a + bm in R, where a, b are rational. Prove that Q(W’m) is a subfield of R. 

9 Analyze the method taught in high schools for extracting square roots. 
Devise a similar method for extracting eube roots. 

10. Let S be an infinite set of real numbers, all lying between two fixed numbers 
a,b. Prove that S contains a Cauchy sequence. (This is the Bolzano-Weierstrass 
theorem. Hint: Divide the interval from a to b into two equal parts, then repeat 
that for each part, ete.] 


. ak a . 
“11. Prove that the series 5° ju converges for any 2. (Hint: Let x be an in- 
&=0"° 


lo] on ( n i ; 
> Sect. f(t . 

teger > |z|, and show that i < ai GET for large k. Then compare with 

the geometric series.] 


The field of complex numbers 


1. The square root of -17 


For any nonzero element x of an ordered integral domain, the element 2? is always 
positive. In particular this is the case for any real number, and therefore the 
equation x? = —1 cannot have a solution in the field of real numbers. It was 
this deficiency that led to the introduction of complex numbers into mathematics. 
The present chapter is concerned with the problem of enlarging the real-number 
field in such a way that 2? = —1 will have a solution. 


DEFINITION 11 A field C is called a field of complex numbers if 
(1) € contains the field of real numbers R as a subfield. 
(2) € contains an element, denoted by i, suck that i? = —1. 
(8) No subfield of C, other than C itself, contains both R and 4. 
In See. 2 we shall see that it is very easy to build such a field C out of R. An- 
other construction is given in Example 8, Sec. 4, Chap. 6. In this section we shall 
see that any two fields C and C’ satisfying Definition 1.1 are interchangeable. 


THEOREM 1.1 C being a field of complex numbers, any element of C can be expressed 
uniquely in the form a + bt, where a and b are real numbers. 
Proof. Let F be the set of all elements a + i in C, where a, 6 are real. 
We show that F is a subfield of C. In fact, let x, » be any two elements 
of F, By definition of F we have 


uwaathi paetdi with a, b, ¢, d real 

Then 

ubvs=(at+di)t (C+ di) = (+e) + O44) 
Therefore u +2 and u — v are elements of F. Furthermore, 


uv = (@ + die + di) = ae + adi + bei + bai? 
= (ae — bd) + (ad + be) i 
Hence we is in F. If u = 0, then 
toot 1 1 ab _ a bi a2 io}. 
“ath ah FR at ye” a $B 
+ Except for See. 4, the only fact needed about R is that it is an ordered field containing 


= 7 i 
“ueoathi 
the (positive) nth root of any positive element. 


The square root of —1 119 


and so the inverse of a nonzero element of F is also in F. It follows 
easily that F is a subfield of C. Furthermore F contains all elements 
a+0-i=a, where a is real, and so F contains R. # also contains 
O41 i, and therefore by (3) of Definition 1.1 we must have F = C. 
Hence every element of C can he expressed in the form a + bi, with a 
and b real. Now suppose that a + bi = a’ + b’i, where a, a’, b, b! are 
real. Subtracting the two we get (¢ — a’) + (6—6)-7 =0. Multi- 
plying this by (a — a’) — (b — b’)- i we get 


@—ayP+b—by =0 


and since a — a’ and 6 — b’ are real, it follows that a QED. 


Hence from a + bi = a’ + b't we can conclude a = a’, b = 8’, provided a, a’, 
b, bY, are real, 


corouary The only elements in C whose squares are equal to —1 are iand —i. 
Proof. Let j be an element of C such that j? = —1. By the theorem 
we can write j = a + bi, where a, b are uniquely determined real numbers. 
Thenj? = -1 = (a + bi)? = (a? — 6%) 4+ 2abi. Since -1 = -1 40-3, 
it follows from the uniqueness that a? — 6? = -1 and 2ab = 0. From 
this we see thata =Oandb=+1. QED. 


We shall call i and —i imaginary untis in C. Observe that Theorem 1.1 holds 
if we replace i by —2. For if u = a + di, with a, 6 real, then also x = a! + 
&(—1), where a’ = a, b' = —b. 

The following theorem shows the essential uniqueness of fields of complex num- 
bers. 


THEOREM 1.2 Let C and C’ be two fields of complex numbers and let i and i be 
imaginary units in C, C’, respectively. Then there is one and only one isomorphism 
from C to C' which sends i into i and sends real numbers into themselves. 
Proof. If f is an isomorphism with the required properties, then for 
any element a + bi of C (with real a and b) we have f(a + bé) = fla) + 
(bi) = fla) + fb) fi) = a+ 6-f) =a+ dv, Hence there cannot 
be two such isomorphisms. Conversely if we define the mapping f from 
€ to C’ by f(a + bi) = a,+ bY, then it is easily verified, using Theorem 
1.1, that f is an isomorphism of the required kind. @.E.D. 


Remark. There are many isomorphisms from C to C’ (or from C to itself) 
which do not send every real number into itself. 


As we have done with Z, Q, and R, we now suppose that some field of complex 
numbers is fixed once for all (for example the one constructed in Sec. 2). We 
denote it henceforth by €. Its elements will he called complex numbers. 

If we take C for both C and C’in Theorem 1.2, and —i for 7’, we get the following: 


120 The field of complex numbers Ch. 5, See. t 


corouany There is one and only one isomorphism from € to itself which sends i 
into —i and sends every real number into itself. Namely, if u = a + bi, where a 
and b are real, then the isomorphism sends u into a — bi. 

This isomorphism is very important. Some properties of it are listed below. 
First of all, if u is a complex number and if we write u = a + bé, where a and b are 
real, then ais called the real part of u, written Re {u], and b is called the imaginary 
part of #, written Im {x}. If} = 0, then w is of course a real number; if a = 0, 
then w is said to be a pure imaginary number. The isomorphism of the corollary 
sends u into a — 6é, and that number is called the complex conjugate of u, usually 
denoted by #. We have 


aa a~bi  ifa, dare real 

12 —t 

a3 (that is, u is the conjugate of %) 

a if and only if w is real 

us = —u — if and only if u is pure imaginary 

ae u +i = 2a =2 Re {u| au ~ & = 2bi = 2ilm fal 
aa ud =a? + 0 


The quantity 
18 jul = Vue = Ve Foe 
is called the absolute value, or modulus, of u. Note that |u; = 0 if and only if 
u = 0. Clearly [a = |ul. 

Since any isomorphism (in particular the isomorphism u — #) is by definition 


compatible with addition and multiplication, we have the following rules (easily 
verified directly): 


1.9 wbrva=i+s 


Now |url? = uv - a, by (1.8), and uo- ao = uo- i = wd 08 = ul? [eP, by 
(1.10); hence 


att fux| = || - fo 
Furthermore 
au lu +2] <|ul +[e| (triangle inequality) 


To prove this we have 


lu +o = (ute) (+0) = (U+2)- +5) by (9) 


ui + va tub + 


The square root of —1 121 


Put w = ui. Then & = a, by (1-10) and (1.8). Therefore ui + v@ = w +0 = 
2 Re {w}, by (1.6). It is plain from (1.8) that [Re {w}| < |u|. But by (1.11) we 
have |w| = |u| = |u| + | = |ul- |v]. Therefore we get 


Ju + oP? < al + 2p - fel + Jel? = (eel + el)? 
from which (1.12) follows. 
By a similar argument it is not hard to prove that 
ais ju — 9] > [Iu] — [ell 


THEOREM 1,3 Let ao, a), dz, - . - » dn be real numbers, If u is a complex number 
satisfying the equation 


age paw +s + Ane tan = 0 
then so is at 


Proof. Set w = ayu" + au*! + +++ +a, By assumption, w = 0. 
But from (1.9), (1.10) we have @ = a" + aq, #7 4 + +a,. Since 
the a’s are real and # = 0, the assertion follows. Q.E.D. 


The following theorem expresses a very remarkable property of the field C. 
The theorem is usually called the fundamental theorem of algebra. It is no longer 
considered to have that exalted status, but it is nonetheless very important. Its 
proof is beyond the scope of this book but can be found in any standard text on 
the theory of functions of a complex variable. 


THEOREM 1.4 Lei ao, %, . . . , a, be any complex numbers, with n > Oand a # 0. 
Then the equation 


age" paar + ++ faut +4, = 0 


has @ solution in C.t 

As we shall see in the following chapter, it is easy to deduce from this that the 
equation has exactly solutions (some of them possibly repeated). For this reason 
the field € is said to be algebraically closed. It can be shown that every field can 
be embedded as a subfield in an algebraically closed field. 


The field € cannot be an ordered field, since 2 = -—1 <0. Nonetheless it is 
possible to define a “distance” between any two elements u and v of C, namely, 
the real number |x — v|. A geometrical interpretation of this is given in Sec. 3. 
In See. 4 we show that it can be used to define Cauchy sequences in C. 


+ We mean that, if the equation becomes true when 2 is replaced by u, then it becomes true 
when z is replaced by the conjugate of u, 

t That is, there is a number in ¢ such that the equation becomes true when z is replaced 
by that number. 


122 The field of complex numbers: Ch, 5, See, 2 


EXERCISES 
1. Express the following complex numbers in the form a + 57, with a and 6 real: 


@ @ +88? 0) 06 +3408 -9 
(e) 1a Ory 
1 

ca Op 

Pe : 
w Ee » 8 a 

142% 
© erp 


2, Show that the set of all 2 x 2 matrices of the form 


a e) 
Se ») 
with @ and b real numbers, is a field of complex numbers if multiplication and 
addition are defined as in Exercise 3, Sec. 2, Chap. 2. 

3. Let a be a positive real number. Show that +7- Va are the only two solu- 
tions in € of the equation x* +a = 0. 

4, Let m be a positive integer, and write /—m for i- Vm (see Example 3). 
Let @ (=m) be the subset of C consisting of all numbers a + 6 —m, where 
a, 6 are rational numbers. Prove that Q (¥/—m) is a subfield of ¢. 

5. Find two solutions in C of the equation 2? + 7 = 0. 

6. Show that the fields @ and R are not algebraically closed. 


2. A construction of C; quaternions 


Here we shall give a very simple construction of a field of complex numbers from R. 
It can be taken for our fixed complex field €. The construction is very closely 
related to Exercise 2 above. 

Let € denote the set of all ordered pairs of real numbers (a,b). We define two 
binary operations in the set € by the rules 


(a,b) +d =@+e,d+¢) 
(a, 6) + (c,d) = (ae — bd, ad + be) 


where in the right-hand members are indicated operations in R. The two formulas 
above are not really very mysterious; the attentive reader will observe a great 
similarity between them and some of the calculations occuring in the proof of 
Theorem 1.1 

It is a perfectly straightforward matter to verify that C with the two operations 
of (2.1) is a commutative ring. That is, it satisfies Definition 2.1, Chap. 2, and 
multiplication is commutative. The zero element of C is the pair (0, 0); the inverse 


A construction of C; quaternions 123 


of an element (a, b) for addition is (—a, —b). The unit element of € is the pair 
(1,0). But Cis in fact a field, for if (a, 5) = (0, 0), then it is easily verified, using 
(2.1), that the element 


a —b 
) 
is the inverse of (a, 6) for multiplication. 

Now let F be the subset of C consisting of all pairs (a, 0). From (2.1) we have 
(a, 0) + (b, 0) = (a +5, 0) and (a, 0) - (b, 0) = (ab, 0). It follows readily that 
RF is a subfield of C and that the mapping R — FR defined by a — (a, 0) is an 
isomorphism. 

The field € just defined does not contain R itself, but it contains an isomorphic 
copy F of R, which is just as good. If we want to be very meticulous here we can 
simply replace each element (a, 0) of € by a itself, and in that way we obtain a 
field which really contains the original R as a subfield. We shall not bother 
about this point. 

Now in € we have (0, 1)? = (0, 1)- (0, 1) = (—1, 0), by (2.1), and (—1, 0) is 
the same as —(1, 0), being the inverse for addition of the unit element (1, 0) of C. 
Therefore if we put i = (0, 1), then we have # = —(1,0), orsimply 2 = —1if we 
agree to replace (1,0) by 1. Furthermore it follows from (2.1) that 


(a, 6) = (a, 0) + @, 0) (0, 1 


or 


(a, b) = (a, 0) + 6,0) +7 


Again if we agree to replace (a, 0) and (6, 0) by @ and 8, this becomes (a, b) = 
a + bi. It follows easily that C satisfies Definition 1.1 and so is a field of complex 
numbers. 


Let a, & be two real numbers. In € the expression a? + b° can be factored 
into a product of two terms containing only first powers, namely, a? + b? = 
(a + bi}(a — bi). But in general a? + 6 + ¢ or a? +o? +e? +d? cannot be 
factored in this way. By a procedure analogous to that above, one can construct 
a system U in which auch expressions ean be factored. Namely, let U be the set 
of all quadruples (a, b, c, @) of real numbers. Define addition in U by 


(a,b, e, d) + (a’,b’,0,d’) = (a+a’,b +b,e+e,d +a’) 


entirely analogous to the first equation of (2.1). Multiplication in U is somewhat 
more complicated. A formula similar to the second one of (2.1) can be written 
down. Without actually doing that (it is rather lengthy) we can deseribe the 
result as follows: Write e = (1,0, 0,0), 7 = (0, 1, 0, 0), & = (0, 0, 1, 0), andi = 
(0, 0,0, 1). Then the rule gives the following: 


124 The field of complex numbers Ch. 5, See. 2 


for any element u = (a, },¢,d)in U. Hence ¢ is the unit element in U. Further, 


=-l 
-Jj 
=k 


22 


and if u = (a, 6, ¢, d) is any element of U, then (a, b, ¢, d) = (@, 0, 0, 0) + 
(B, 0, 0, 0)-7 + (0,0, 0)-& + @, 0, 0,0) +4, and vu = we for any 2 of the type 
2 = (7,0, 0,0). It tums out furthermore that the distributive law holds. Then 
using these equations it is easy to calculate any product in U. It is not hard to 
verify that U is a ring and that the elements (a, 0, 0, 0) form a subring which is 
isomorphic toR. For simplicity of notation let us therefore write @ for (a, 0, 0, 0). 
In particular instead of e we write simply 1. Then the last equation above is 


(a, b,¢,d) =a 4+ dj tek + di 
Then, for example, we have 
@, 3, 2,0)- (1, 1, 0, -1) = (87 +2k)-( +9 -D 


Using the distributive law and the fact that any element a = (a, 0, 0,0) commutes 
with every element of U, we get 


37 + 37? — ejl + 2k + 2kj — 2kl 


Using (2.2), this becomes 8) — 8 + 8k + 2k — 2-2 = -8 +7 +5k-2= 
(-8, 1, 5, —2). We have in general 


(a, b, ¢, d)- (a, —b, —e, —d) = (a + bi + ej + dk)la — bi — oj — dk) 


=@+h + ope 
as is easily verified, and therefore a? + b' + ¢? + @ can be factored in U in the 


manner indicated. Moreover if (a, b, ¢, d) (0, 0, 0, 0), then from the equation 
above there follows 
=) 
r 


@, be, a) = ( 


where 
r=@4R4+C84@ 


Hence every nonzero element of U has an inverse for multiplication. Conse- 
quently U satisfies all the axioms for a field except one: Multiplication in U is 
not commutative in general. A system of this sort is called a skew field or a divi- 
sion algebra. 

All the elements (a, 5, 0, 0) = a + bj in U form a subfield isomorphic to C; 
so do all the elements (a, 0, 5, 0) and also all the elements (a, 0, 0,5). U is called 


A geometric interpretation of addition and multiplication 125 


the system of guaternions. It was discovered by K. F. Gauss (1777-1855) and, 
independently, by Sir William Rowan Hamilton (1805-1865). 

We observe from (2.2) that —1 has more than two square roots in U, namely, 
+4, +k, 4:1. This is a consequence of the fact that multiplication in U is not com- 
mutative, as we shall see in the next chapter. 


EXERCISE 


Show that the set of all 2 x 2 complex matrices of the form 
(Gane ae) a, b, ¢, d real 
-c+di a —bi 
form a system isomorphic to U, multiplication and addition being defined. as in 
Example 8, Sec. 2, Chap. 2. 


3. A geometric interpretation of addition 
and multiplication of complex numbers 


In this section we shall depart temporarily from our axiomatic development. We 
shall borrow some simple results from geometry and trigonometry in order to 
show how operations with complex numbers can be represented pictorially. 

If x and y are real numbers, then we can think of them as the coordinates of a 
point in the plane, and consequently we can represent the complex number z + iy 
by that point. For example, the numbers 1, i,2 + iand their conjugates —7,2 — i 
are so represented in Fig. 1. In general we think of x in x + iy as the horizontal 
coordinate and of y as the vertical coordinate. 

Ife =z + iwyandw = u + ware two complex numbers, with «, y, u, and » real, 
then z+ w= (2 + wu) + i(y + 2), and it is easy to see that this corresponds to 
Fig. 2. Thus the point representing z + w is obtained by completing the paral- 
lelogram whose first three vertices are 0, z, w. 

Multiplication is a little more complicated. Recall that if 2 = 2 + ty (x and y 
real), then 2 = @ ~ iy and |z| = Vai = Vx? + # (see Sec. 1). Consider Fig. 3 
(in which we assume for simplicity that x, y are positive). 


Figure 1 


i e2ti 


126 The field of complex numbers Ch. 5, Sec. 8 
From trigonometry we have 
a1 2 = reosé y=rsing revert = lal 


The anglet @ (determined only up to multiples of 2r) is called the argument of z, 
denoted by arg z. 
From (3.1) we observe that z can be written in the form 


32 2 = (cos 6 + isin 8) 


and we call this the polar form for z. For example, z = —1 has argument x, and 
its polar form is 


-1 =1- (cos x +7sin x) 


The argument of —}4 + i( ¥3/2) is 27/8, as is easily seen, and its absolute value 
is Vu 04 +30) +34) = 1, Hence, 


1 v8 Qn On 

—a inp = coat + isin 
Similarly for the conjugate we have 

1, MBL cos + ésin 

2 27 an 3 


Now let 2: = 21 + ty: and 22 = x2 + iy: be two complex numbers and write them 
in polar form: 


a1 = "(cos + isin &) 2 = 7(coB # + 7 sin O) 
Then 

22 = ryra{(cos 4 cos 6 — sin 6 sin 6) + i(sin 6 cos & + cos A; sin 4)] 
+ We use only “radian measure” here. For example, 90° is 7/2, ete. 


Fi 2 
“pure iy + v) 


z+w 


A geometric interpretation of addition and multiplication 127 


Figure 3 
4 


ZA\ 


ty 


From the well-known formulas from trigonometry for cos (@ + 6) and sin (#; + 4) 
there follows 


Z2e = rersfeos (8, + 62) + isin (A + 4)] 
Hence we have the following theorem. 


THEOREM 3.1 For any two complex numbers 2 and 22, 


lewza| = lea] - [zal 


and 
arg (22) = arg % + arg % 
(The first equality was proved in Sec. 1.) 


From this it is easily seen how to find the point representing z:z» geometrically: 
Just add the two angles and multiply the two absolute values. 


THEOREM 3.2 (De Moivre’s theorem) For any integer n, 
(cos @ + isin #)" = cos nB + isin né 


Proof. For positive » this follows at once by induction from Theo- 
rem 3.1. For # = 0 it is trivial. For negative integers we proceed as 
follows: Put 2 = cos @ +7 sin @. Then |z|? = cos? # + sin? @ = 1, and 
802% = Thus 1/z = 2, and so 1/2" = 2", If now 7 is a positive inte- 
ger, then we have z~" = 2" = (cos@ — isin 6)" = (cos (—#) + isin (-#))", 
and by what has already been shown, the latter is equal to cos (—n@) + 
isin (—7@). QED. 


We point out again that we have departed from our axiomatic presentation in 
our use of some results of geometry and trigonometry—in particular the addition 
formulas for the sine and cosine. It is quite possible to give completely rigorous 
accounts of those subjects, and there is no doubt about the validity of our con- 
clusions. 

De Moivre's theorem has an important application: It tells us how to find nth 
roots of any complex number. 


128 


The field of complex numbers Ch. 5, See. 8 


THEOREM 3.3 Lei a = r(cos 6 + 7 sin @) be any complex number and n a positive 


integer. Then the numbers 


33 


wim (cos 2 #2 4 gin 28H) =0,1...,2- 


satisfy the equation x” = a, and they are the only complex numbers that do satisfy it. 


Proof. By De Moivre's theorem the nth power of (3.3) is r(cos (@ + 2mk) 
+ isin (6 + 2xk)), and this is just r(cos@ + isin @) because cos (@ + 2xk) 
= cos @ for any integer #, and sin (@+ 27k) = sin 8. Observe that if 
a # 0, then (3.8) gives n different nth roots of a. If we plot these num- 
bers in the plane, then they are represented by n points spaced equally 
round a circle of radius r!", Now let y = ri(cos @ + i cos a) be 2 num- 
ber such that y” = a. By De Moivre’s theorem we have then @ = 
r(cos @ + i sin 6) = ry"(cos na +i sin na). Taking absolute values of 
both sides we get 7" =r. Since r, ia assumed non-negative in the polar 
form, we must have r, =r". But then (if a # 0) we must have cos @ = 
cos na and sin @ = sin na. It is easily seen from this that y must appear 
among the numbers (3.3). (In the next chapter we shall prove in general 
that 2" = a cannot have more than » solutions in any field.) @.E.D. 


EXERCISES 


1. Find all solutions in C of the following equations: 


(@) 2 =1 (b) 2 = -1 
() @ =i @ &=i+v3 


2. Let n be a positive integer and put w = cos 2r/n + isin 2x/n. Show that 
1, w, w?, .. . , ware all solutions of 2" = 

3. Show that L+w+wi+--+- tw 
cise 2. 

4. Show that if b is one solution of the equation x* = a (a any complex number), 


= 0, where w is defined in Exer- 


then the other solutions are bw, bw?, . . . , bw"~1, where w is as in Exercise 2. 


. If 1, w, w are the solutions of x? = 1 (nm = 3 in Exercise 2), then show that 


@) 1+) aw 
(6) —w + w\(L + w — w) = 4 
() @ - w)(1 — vw) — wt) -— uw) = 9 


6. Give geometric proofs of the inequalities 


lar + 22]. [ea] + [2ol 
lex — 2] > [lel — [ell 


for any complex numbers 21, 22. 


7. Give necessary and sufficient conditions for the equality 


lex + 20] = ler, + [zal 


8. What is the distance between the two points in the plane representing two 


complex numbers wu and 2? 


Cauchy sequences and infinite series in C 129 


9. Prove that |z + 2'|? + |z — 2"? = 2|zl? + 2|e/|? for any complex numbers 
z, 2’. What is the geometric interpretation of this identity? 

10, Compute the sum 1+2cosz +2cos2x +--+ +2eosnz. [Hint: cos 
ke = real part of (cos x + isin x)*] 

11, Determine n complex numbers x such that (x + i)" + (x — i)” = 0. 


4, Cauchy sequences and infinite series in C 


The absolute value |z| of a complex number z is a real number, positive unless 
z=0. If z, ware two complex numbers, then the quantity |z — w| can be inter- 
preted as the “distance” between them, and from Sec. 3 this distance has a simple 
geometric meaning. By using it we can carry over bodily the definitions and main 
properties of Cauchy sequences as given in Chap. 4. For the sake of brevity we 
omit many minor details. 


DEFINITION 4.1 Let C1, C2, Cs, . . . be an infinite sequence of complex numbers. It 
ig called a Cauchy sequence 7f for every real number e > 0 there is an integer P (de- 
pending in general upon e) such that 


[en — Cm| <€ for all n, m > P 


DEFINITION 4.2 Let ¢1, ¢2, Cy... be a sequence of complex numbers and let ¢ be 
acomplex number. Then the sequence is said to have c as a limit, in symbols lime, = ¢, 
if for every positive real number ¢ there is an integer P such that 


le-el<e foralln >P 


It is clear that if the c’s happen to be real numbers, then these definitions coincide 
with those of Sec, 2, Chap, 4. 


PROPOSITION 4.1 Lei ci, 2, Cx, . - . be @ sequence of complex numbers, and write 
Cn = Qn +d, 4, where a, and b, are real. Then the given sequence ci, ca, €, . . . 38 
a Cauchy sequence if and only if both a1, a2, a3, . . . and by, by, bs... are Cauchy 
sequences, 


Proof. This ia a atraightforward consequence of the simple inequalities 
Jan — Gils [Bn — Om, S len — Cnl S lan — nl + [de — Beal 


corotiary The sequence Cy, C2, C3, . - . has a limit c if and only if it is a Cauchy 
sequence, and ¢ is unique. 

The proof follows easily from the proposition and is left as an exercise. 

Just as in the case of the real numbers, these notions allow us to define infinite 
sums under certain circumstances. 


DEFINITION 4.3 Let wo, un, M2, - . . be a sequence of complex numbers. Then the 


infinite series )~ uy is said to converge if the sequence = Uo, $1 = %o + ay & = 


Erst 


130 The field of complex numbers Ch. 5, See. 4 


to bat tay... = Mo tat +++ +a, ete, is a Cauchy sequence. If 
that is so, then the number lim s, is defined to be the sum of the series. 


Hence for brevity we can write 
xe = lim Domaine tet + un) 


provided that the series converges. 


exampce1 If z is any complex number with |z| < 1, then the series > 2" con- 
nao 


verges and its sum is 1/(1 — 2). (This series is called the geometric series.) 


ExampLe 2 For every complex z the series 


converges. Its value is denoted by e’; it is called the exponential function. Then 
for two complex numbers z, w we have 


> (2 + w)* 


T 
a ont 


By applying the binomial theorem to (2 + «)", we get 


eam FU (esw 


fa 


Using this it is not difficult to prove that 
4a ete = et. gv 


ExAmpPLe 3 For every complex number z the two series 


(=1)%2% o. girth 


“ler 2 CYeam 


converge. The sum of the first one is denoted by cos 2 and the sum of the second 
by sin z. 

It is clear that if z happens to be a real number, then all the examples above 
coincide with those given at the end of Chap. 4. 


Cauchy sequences and infinite series in ¢ 131 


5 . my 2m . wee 
If in the series e* = 5° qi We teplace « by iz, then it is easy to see that the terms 
ant 


for even n are the same as the terms of the first series in (4.3); for odd n the terms 
are the same as the terms in the second series of (4.3), multiplied by i. From this 
it is simple to show that 


aa e® = cose +isinz 


Of course it is far from obvious that cos z and sin z as defined here have anything 
to do with the sine and cosine functions of elementary trigonometry. We refer to 
standard calculus texts for a proof that they are the same things when 2 is real. 
However it is easy to see that they have some of the correct properties. For 
example, from (4.3) it follows readily that cos (—2) = coszand sin (—2) = —sinz. 
Hence, by (4.4), e-* = cos 2 — isin z, Then, using (4.2) and the fact that & = 1 
we get 


1 =e! = elte~® = (cos z + isin z)(cos z — isin z) 
whence 
a5 cos? 2 + sin? z = 1 for all z 


Now if z is real, then so are cos z and sin z because then all the terms in (4.3) are 
real. From (4.5) we conclude that 


as —1 Seos2 <1 —-1lssinze <1 if 2 is real 
Furthermore, putting z + w for z in (4.4) and using (4.2) we get et") = eel", or 


cos (2 + w) + isin (2 + w) = (cos z + isin 2)(cos w + isin w) 
= (cos zcos w — sinzsin w) + i (sin z+ cos w + cos z- sin w) 


If z and w are real then all the quantities appearing here, except i, are real. Hence 
the real and imaginary parts of both sides must be equal, and we get 


cos (2 + w) = cos z+ cos w — sin 2-sin w 
sin (g +w) = sin z- cos w + cosz-sin w 


These equations are also valid if z, w are not real. They are of course the usual 
addition formulas for the sine and cosine, From the series (4.3) themselves we 
cannot prove that cos = —1 or sin x = 0, ete., for it is difficult to connect. the 
two series with the number 7. 

The examples just cited are very important ones, and they serve to give some 
indication of the great power of Cauchy sequences, limits, and infinite series. 
These concepts are exploited very fully in the part of mathematics called analysis. 
‘We shall not have occasion to refer to them again except in Chap. 12. 


132 The field of complex numbers Ch. 5, See. 4 


EXERCISES 


1. Let > 4», >* dy be two convergent series of complex numbers, and let ¢, d be 


neo n=O 


any complex numbers. Prove that > (ca, + db,) is convergent and that its sum is 
ben 
equal to 


ey an td- Dn 
b= = 
2. Let > a, be a convergent series of complex numbers. Prove that lim a, = 0. 
nao aoe 


Prove that the series} * a, converges, for any positive integer k. Denoting its sum 
= 
by ex, prove that lim cx = 0. 
koe 


3. Prove that if > g, is a convergent series whose terms are all positive real 
n=0 


numbers, then any series > b, such that |b,| <a, for all » is also convergent. 
a=0 


4. Prove that }~ a, converges if the set of numbers 8, = ao] + lax] + ++ + + 
pos 


|a,| has an upper bound. 

5. Give a complete proof of Theorem 4.1 and its corollary. 

*6. Show how to define Cauchy sequences and infinite sums in the system of 
quaternions. 


Polynomials 


1. Introduction 
On several occasions we have had to deal with equations of the type 
4a age Faw +++. tant ta, =0 


in which a, a1, . , denote elementa of some field and x denotes an “un- 
known.” For example, at the beginning of Chap. 5 we pointed out that there is 
no real number satisfying the equation x? + 1 = 0. Here obviously x does not 
represent an element of the field R because there is no such element in R; 2 is merely 
a letter used to allow us to write out an equation which might or might not be- 
come true if z is replaced by some element of the system. Similar remarks apply 
to the equations occurring in Theorems 1.3 and 1.4, Chap. 5. It is sometimes 
convenient to be able to consider x as an element of an algebraic system, rather 
than as just a letter to be used for writing equations. The present chapter is con- 
cerned with showing how that can be done and with the algebraic systems that 
result, 


2. Indeterminates, or variables 


Ti ay, a, . . . , a, and ¢ are any elements of a ring A, then a combination of the 
form 
21 ap tat tat +--+ tant 


is called a polynomial, or more precisely a polynomial in é with coefficients in A 


exampcea The quantity 1 + 3-5! + 4-5 — 2-52 + 6. 5¢is a polynomial in 
V5 with coefficients in Z. 


PROPOSITION 21 Let A be a ring, Ba subring, and t an clement of A, Suppose that 

bt = tb for every element b of B; and denote by Blt] the set of all elements in A which 

can be expressed as polynomials tn t with coefficients in B. Then Bil] is a subring of 
A, and any subring of A which contains B and t must contain Blt). 

Proof, We have only to show that the sum, difference, and product of 

any two elements a, b in B[t] are also in B[). By assumption the elements 

can be expressed as polynomials a = a) tat + +++ + at" and 6 = 


134 Polynomials Ch, 6, See. 2 


by + bf + - - - + bat" with coefficientsin B. Thena + 6 = (ao + bo) + 
(ay + 61)! + (@2 + be)? + ete. This is a polynomial in ¢, and its coeffi- 
cients a; + 6, are in B, since B is a subring. Therefore a + 6 is in Bié). 
The same is true fora — b. For the product we have 


ab 


(ao Fat tos + + aul™)(by + dE + os + byl) 
= adobe + aobit + aytbo + +++ + apbblt +--+ + antl 


We have assumed that ct = te for anycin B. Then, by a simple induction, 
ct* = tke for any positive integer & (see Sec. 7E, Chap. 2). Therefore the 
expression above for ab can be written 


ab = aobo + (Qubr + arbo)t + (aab2 + arbi + aabo)t® + + + - + amb,mtm 


which is a polynomial in ¢ with coefficients in B. Hence ab is in Bit], and 
Bit] is a subring of A. If C is any subring of A containing both B and ¢, 
then it must contain &, &, etc. It follows at once that C contains every 
polynomial in ¢ with coefficients in B, and so it contains Bit]. @.E.D. 


From the calculations just made it is clear that the coefficient cy of ¢* in the ex- 
pression for ab is the quantity 


22 x = Qube + aiden + Qebpee + ++ + + Qeardr + ands 


where it is understood that a; = 0 if 7 exceeds m and b; = 0 if & exceeds n. 
It is obvious that B(f] must be a commutative ring if B is commutative, assuming 
always that ¢ commutes with every element of B. 


DEFINITION 2.4 The ring Bit} of the preceding proposition is called the ring obtained 
by adjoining ¢ to B. 
The following definition is fundamental: 


DEFINITION 2.2 Let A be a ring, let B be a subring, and let x be an element of A which 
commutes with every element of B. Then x is said to be an indeterminate over B 
(or to be variable over B, or transcendental over B) if no polynomial in x with co- 
efficients in B is zero unless all the coefficients are zero. In the contrary case x is said 
to be algebraic over B, provided that B is an integral domain. 

In other words, the requirement for x to be variable over B is that 6r = xb for 
every bin B and that x should not satisfy any polynomial equation with coefficients 
in B, unless of course all the coefficients are zero. In the case that A is an integral 
domain, the assumption that z commutes with every element of B is superfluous, 
since multiplication in an integral domain is commutative, by definition. 


EXAMPLE 2 The complex number « = —(1/2) + (V3/2)i is algebraic over Q 
(also over Z, R) because w? — 1 = 0. Thereal number a = —2 + V2 is algebraic 
over Q, since a? + 4a +2 = 0, as is easily verified. Complex numbers which are 
algebraic over Q are called algebraic numbers. All others are called transcendental 


Indeterminates, or variables 135 


numbers. It is known that the number 7 is transcendental. To repeat the defini- 
tion, this means that if 


dg Faye + Oye? + ++ far = 0 


where the coefficients ao, a1, . . . , @ are rational, then they must all be zero. 
In other words, + does not satisfy any polynomial equation with nonzero rational 
coefficients. The number ¢ is also transcendental. In fact, ‘nearly all” real num- 
bers are transcendental. For it can be shown fairly easily that all algebraic num- 
bers in R form a countable subset, whereas R itself is uncountable, It is usually a 
very difficult problem to determine whether a given number is algebraic or transcen- 
dental, 


By an indeterminate, or variable, over a ring B we shall mean an element of a ring 
containing B for which the conditions of Definition 2.2 are satisfied. Thus, if 
is variable over B, then x must commute with every element of B. From the 
definition we have the following immediate consequence: 


pRoposiTION 22 Let x be a variable over a ring B. Then Blx) is a ring, and every 
nonzero element of Bix] can be expressed in one and only one way as a polynomial 


28 f@) = ao tae +--+ + ant" 


with coefficients ao, a, ..., dn in Band a, # 0.t 


Proof. B[x]is a ring, by Proposition 2.1. Suppose that an element of 
Biz] can be expressed in two different ways 


ao tae +--+ + ane" = bo + be + s+ + + Bax” 


and for definiteness suppose that m > n. Subtracting, we obtain 


(ao — Bo) + (ay = BE + oe + Gn — Dalat + Baguette es 
+ by = 0 

From Definition 2.2, all the coefficients here must be zero. Thus a; = 5; 
forj =1,..., %, and b; = Oforj >. The assertion follows at once. 
QED. 


The notation f(x) of (2.8) is simply an abbreviation, of which frequent use will 
be made. 


oerinttion 2.3 Let x be a variable over a ring B. Then Bix] is called the ring of 
polynomials in x over B, or with coefficients in B. If fe) = ao + ax + + aye” 
is an element of Biz] and if a, = 0 (the a, are supposed to be in B), then n is called 
the degree of f(x), denoted by deg f(x). The coefficient a, is called the highest co- 
efficient of f(x); if an = 1 (unit element of B), then f(z) is called a monic polynomial. 


+ Naturally any number of terms 0-2"! + 0-2#? +, ete., can be added on; but then the 
uniqueness is destroyed. 


136 Polynomials Ch. 6, See. 2 


Observe that no degree is assigned to the zero element of Blx]; we consider it to 
have any degree whatever in order to avoid annoying exceptions. For example, if 
we speak of all polynomials of degree 5, we understand that 0 is among them. 

We shall show presently that there is always a variable element, or indeterminate, 
over any ring B. Therefore a polynomial ring over B always exists. 


eroposition 23 Lei Bix] be a polynomial ring over a ring B, and let f(x) and g(x) 
be two nonzero elements of Bix]. Then deg (f(a) + 9(x)) < max {deg f(z), deg g(z)}.t 
If B is an integral domain, then deg (f(x) «9(x)) = deg f(z) + deg g(x), and Biz] is 
an integral domain. If f(x) and g(x) are monic, then so is f(a) - g(a). 

Proof. Let f(x) have degree m, with highest coefficient a, and let g(x) 
have degree n, with highest coefficient b,. It is plain that f(z) + g(x) can- 
not have degree greater than both m and x, which is the first assertion. 
From (2.2) it follows at once that the highest power of x which can ap- 
pear with nonzero coefficient in f(x)- g(a) is x™**, the coefficient being 
a,b,. By assumption, a, ~ 0 and b, ~ 0. Hf B is an integra] domain, 
then anb, # 0 (see Theorem 2.4, Chap. 2), and so f(x)-9(x) has degree 
m +n, the highest coefficient being a,b.. In particular, fir). g(x) #0. 
As we have already observed, if B is a commutative ring, then so is Biz]. 
Hence, if B is an integral domain, then B[z] is a commutative ring in 
which the product of two nonzero elements is again nonzero, In other 
words, B[z] is also an integral domain. The last assertion of the proposi- 
tion is obvious. Q.E.D. 


Again let z denote a variable element over a ring B, and let 
24 F(z) = a +a Fes + Fane" 


be any element of the polynomial ring B[z], the elements ao, . . . , a» being in B. 
Let ¢ be any element of B, or of some ring containing B. Then we can form the 
element 


28 FQ =a tat+--> +a," 


simply replacing x in (2.4) by the element t. The following theorem concerns this 
substitution operation. 


THEOREM 24 Let x be an indelerminate over a ring B. Let t be any element of a ring 

containing B as a subring, and suppose that t commutes with every element of B. 

Then the operation that sends every polynomial f(x) in Blz] into f(t) is a ring-homo- 
morphism from Bix] to Bid). 

Proof. First of all, the operation defines a mapping from B[x] to Blt), 

since elements of B(x] have unique expressions as polynomials (2.4), apart 


} The symbol max fa, b| denotes the larger of the numbers a, } or else theit common value 
if they are equal. 


Indeterminates, or variables 137 


from terms with coefficient zero. The theorem states that if f(z) and 
g(x) are any two elements of B[z], then the results of forming f(x) + g(x) 
and f(x) - g(x) and thereupon replacing x by ¢ are f(t) + g(t) and f() - g(t), 
respectively. The verification is perfectly trivial, since calculations with 
polynomials in z are the same as calculations with polynomials in t, both 
being as in the proof of Proposition 2.1 above. Observe, however, that. 
the assumption that ¢ (as well as x} commutes with every element of B 
is essential, since otherwise the calculation of the product in the proof 
of Proposition 2.1 is not correct. 


As a special case, suppose that B is a commutative ring, and let ¢ denote any 
element of B. It is clear that B[é] is none other than B. The hypotheses of the 
theorem are fulfilled, and therefore substitution of ¢ for x defines a homomorphism 
from Blz] to B itself. 

Homomorphisms of the sort just described will be called substitution homo- 
morphisms, for obvious reagons. The whole point here is that an element f(x) in 
the polynomial ring B[z] has a unique expression as a polynomial with coefficients 
in B—unique, that is, apart from terms with zero coefficients. But the corre- 
sponding element f(f) may have many different expressions as a polynomial in ¢ 
with coefficients in B. That is, we may have f(t) = g(t) but f(z) # g(x). For ex- 
ample, let B be the ring of integers, and take f(x) = x — 1 and g(x) = 2 [hence 
o(z) has degree zero]. We have f(8) = 9(3) = 2, but f(x) = g(z). 

Tn this situation it is clear that trying to define an inverse mapping B[é] — B[z] 
by the rule f(¢) — f(x) would be nonsense, since the same rule would send g(t} = f(¢) 
into g(x) # f(x). In other words, the mapping would not be well defined. From 
Definition 2.2 it follows that every element of B[t] has a unique expression as a 
polynomial in ¢ with coefficients in B (apart from terms with coefficient zero) if 
and only if ¢ is also variable over B (we continue with the assumption that ¢ com- 
mutes with all elements of B). In this case the inverse mapping f(f) — f(z) does 
make sense. Applying the theorem above to it, we have the following corollary: 


conottary 1 If t is also variable over B, then the mapping f(z) — f(t) #8 a ring- 
isomorphism from Blz] to Blt]. 
For this mapping has an inverse and is therefore one-to-one. 


exampte 3 If x is variable over B, then sc is 2° (in fact, every element of B(x] not 
in Bis variable over B). Hence, the substitution x — z* determines an isomorphism 
from B(z] to Blz’]. The latter is a subring of Bia]. 


conoiary 2 Let Biz] and Bly) be two polynomial rings over a ting B. Then there is 
a unique isomorphism from Biz] to Bly] sending x into y and elements of B into them- 
selves. 

For both x and y are variable elements over B, by definition of polynomial ring. 
The assertion follows from Corollary 1. 


133 Polynomials Ch. 6, See. & 


corottary 3 Let x be an indeterminate over a ring B. Any equation of polynomials 
in x with coefficients in B remains true if x is replaced by any element t of any ring 
containing B as a subring, provided t commutes with every element of B. 

The idea here is that an equation such as (x — 8)(x + 2)(2? + 8x +2) = 
x! 4 23 — Tz? — 20x — 12, for example, which is correct in B[z] remains true if 
x ig replaced by , Now that is exactly what Theorem 2.4 says; it simply states 
that sums and products can be performed either before or after substitution of 1 
for x. 

The following theorem shows that polynomial rings over a given ring B always 
exist: 


THeorem 25 Lei B bearing. Then there exists a ring A containing B as a subring 
and containing an element x which ts variable over B. 

Proof. The construction of A is very simple. Namely, we let A be 
the set of all infinite sequences (a9, a1, a2, . . .) of elements of B such that 
only finitely many elements a; are nonzero, Addition in A is defined by the 
rule 


26 (Go, @y, 2,» « ») + (Bo bs Bey...) = (Go + bu hr + bi, ae + be, - ») 


This operation gives again an element of A, because only finitely many of 
the elements ao + bo, a, + by, ete., can be different from zero, since that is 
true of the two sequences on the left. It is a very straightforward matter 
to verify that A with this operation + is an abelian group. Its zero 
element is (0, 0, 0, . . .), and the inverse of (ao, a, a2, . . .) for addition 
is the element (—a0, —a1, —@2, . . .)- 

Multiplication in A is defined by the rule 


2 (Go, Gi, G2,» +») + (Bor by be ss 2) = (Cote es) 
where c, is given (for any k > 0) by the formula 
2.8 ee = aod + Orbea + Qobu—e + + + + + Geode + aeady + axbo 


which is the same as (2.2). By assumption, only finitely many of the a; 
and 6; are different from zero, and so we shall have, say, a; = b; = 0 for 
alli > p. Then from (2.8) it is clear that cx = 0 for k > 2p. Hence, 
only finitely many of the c; can be different from zero, and therefore the 
right-hand side of (2.7) is indeed an element of A. 

It is a straightforward matter to verify that A, with sum and product 


defined in this way, is a ring; its unit element is (1, 0,0, 0, . . .), where 1 
denotes the unit element of B. Now put z = (0, 1, 0,0,...). From 
(2.7) and (2.8) a simple induction shows that «" = (0,0,...,0,1,0, 


0, . . .) for any positive integer , where 1 stands in the (n + 1)th place, 


Indeterminates, or variables 139 


28 


210 


all other entries being zero. Furthermore, from (2.7) and (2.8) it is easy 
to verify that 


(a, 0,0,0 .. 2" = 0,0,.'..,0,4,0,0,...) 


where on the right the element a appears in the (n + 1)th place, zeros 
elsewhere. Then from the definition of addition in A it follows that for 
any element of A, say (ay, aq, . . - , Gn, 0,0, 0, .. .), we have 


(@y 1,-2.4 02,00...) = YO as 0, 0, 0, er ad 
per) 


where as usual z° stands for the unit element (1, 0,0, . . .) of A. 

Now let B’ be the subset of A consisting of all elements (a, 0, 0,0, . . .) 
with zeros beyond the first place. For such elements the definitions above 
reduce to 


(a, 0,0, ...) + @,0,0,...) = (a +b, 0,0, ...) 
and 
(a, 0,0...) + 60,0, .. .) = (4b, 0,0,.-.) 


Therefore B’ is a subring of A. Let us write a’ for the element (a, 0, 


0, . . .) of BY, so that (2.9) above can be written 
(do, 1). ny 0,0, 6. 2) = SY aizt 
kad 
Since the zero element of A is (0, 0, 0, . . .), it follows from this that the 
right-hand side of (2.10) cannot be zero unless ai, ai, a2, .. . are all 


zero. Since x commutes with every element of B’, as is easily seen, it 
follows that x is variable over the subring B’ of A. We observe that 
A = Biz). 

Now A does not contain B, but it contains a subring B’ which is iso- 
morphic to B. Namely, the mapping B — B’ defined in the obvious man- 
ner by a > (a, 0, 0, . . .) = @' is an isomorphism. If we insist upon 
having the original B instead of the isomorphic copy B’, then we can 
simply replace the elements of B’ by the corresponding elements of B 
(retaining for them the operations defined in A for elements of B’).  Q.E.D. 


The theorem shows that, for any ring B, there always exists an element x which 
is variable over B, and therefore there always exists a polynomial ring B[z] over B. 
Any two polynomial rings over B are isomorphic, by Corollary 2 of Theorem 2.4. 
And B[z] is an integral domain if B is an integral domain. In the remainder of this 
chapter we shall deal only with integral domains. 

We point out that the notion of a variable element depends very strongly on 
the ring in question. For example, if x is variable over B, it is certainly not variable 


140 Polynomials Ch, 6, See, 3 


over Bix]. The number x is variable over Q, as noted above; but it is not variable 
over R. 

The terms variable and indeterminate, entirely synonymous here, are conven- 
tional terms; they have nothing to do with with something “varying” or being 
“undetermined.” 


EXERCISES 


1. Carry out the verifications in the proof of Theorem 2.5. 

2. Let B be a subring of a commutative ring A, and let ¢ be an element of A, s 
an element of B. Show that Blt] = Blé + 3]. 

3. Let y be a variable element over a ring B. Prove that every element in Bly) 
which is not in B ia variable over B. 

4, Let B be a subring of a field K, and let ¢ be an element of K different from 
zero. Prove that ¢ is variable over B if and only if {-! is variable over B. 

5. Prove that every element of the ring Q[V2] can be written uniquely in the 
form a + + 2, where a and 6 are rational numbers. Prove that this ring is a 
field. Do the same for Q[V’—2]. 

6. Let B be an integral domain, and let x be an indeterminate over B. Prove 
that if an element x of B[z] has an inverse in B(x] (for multiplication), then x is 
an element of B which has an inverse in B. 

7. Let w be the complex number w = cos(2r/n) +7 sin@r/n), where n is a 
positive integer. Describe the ring Qlw]. That is, show that every element can 
be written uniquely in the form ao + ayw + -- > + a,_,w"-', where the a; are 
rational numbers. (This ring is also a field.) 

8 Let A be an ordered integral domain. Let J consist of all elements a,2" + 
G10"! + +++ + ay of Alz} with a, > 0. Show that this makes A[z] into an 
ordered integral domain. Note that A[z] is not Archimedean: there are elements 
greater than any integer. 

9. Show that 2" + 2-* is a polynomial in z + 2~!, where z denotes an element 
in a field. [Hint: Use induction and consider separately the cases: n even and 
n odd.] 


8. Factorization of polynomials 


We now take up an investigation that parallels very closely the discussion of 
Sec. 8, Chap. 2. We begin with some general definitions. 


DEFINITION 3.1 Let D be an integral domain containing more than one element, and 
let a and b be two elements of D, with a #0. Then a is said to divide 6 if there ts 
an element c in D such that ac = b, An element which divides 1 is called invertible. 
An element p * 0 of D is called prime #f it is not invertible and if the only elements of 
D which divide p are invertible elements and the elements up, where u ts invertible. 

An invertible element x is simply one that has an inverse vw! in D. Invertible 


Factorization of polynomials qt 


elements are divisible only by invertible elements; for if « is invertible and if a 
divides u, then « = ac for some ¢ in D. Then 1 =a - (cu7), showing that a is 
invertible. Any element } is divisible by any invertible element x, for 
b= u- (ud), If a divides b and if u, v are invertible, then ua divides vb. There- 
fore divisibility relationships are not affected by invertible factors. If a divides 
b and b divides a, then b = ua, where u ts invertible. For by definition 6 = ac and 
a = bc’ for some elements ¢, cin D, But then a = acc’, and so from the cancella- 
tion law we have 1 = ce’, showing that ¢ and ¢’ are invertible. 


examete 1 The only invertible elements in Z are 1 and —1. Prime elements of 
Z according to Definition 3.1 are just prime numbers, as defined in Sec. 8, Chap. 2. 

Every nonzero element of a field K is invertible and is divisible by any other 
nonzero element. 


DEFINITION 3.2 Let fr, fo, . .. , fn be elements of an integral domain D, not all 
zero. Then an element h of D is called a greatest common divisor (9.c.d.) of fu fx 
vhf 
(1) A divides fi, fy... sf 
(2) Any element of D which divides fr, .. . , fu must also divide h. 
If h, BY are two g.ed. of fi, fe, . . . , fay then by (1) and (2) they must divide 
each other. Therefore as pointed out above, k’ = a - h, where x is an invertible 
element of D. Conversely, if h satisfies (1) and (2), then clearly so does wh, 
for any invertible element w. Hence a greatest common divisor of a set of ele- 
ments in D (if one exists) is unique apart from an arbitrary invertible factor. A 
greatest common divisor need not exist in general. 


perinition 2.3 Elements fi, fy... , Sn of an integral domain D are said to be 
relatively prime, or coprime, if 1 is a greatest common divisor of them. 
In this case any invertible element of D is also 2 g.c.d. 


The definitions given above are clearly just straightforward extensions of defi- 
nitions given in See. 8, Chap. 2. We now apply them to the case of a polynomial 
ting K[z] over a field K. 

First of all, suppose that f is an invertible element in K[z]. Then there is an 
element gin K[x] such that fg = 1. Thatis,g =f". Thendeg (fg) = degf + degg 
= deg 1 = 0, and so deg f = deg g = 0. Therefore the invertible elements of 
K[z] are precisely the nonzero elements of K. A prime element of K[z] is usually 
called an irreducible polynomial. If p(x) is irreducible, then, according to Defini- 
tion 3.1, pz) must have positive degree and it cannot be expressed as a product 
p(z) = f(z) - g(x) of two polynomials of positive degree. Clearly any polynomial 
of degree 1 is irreducible. 


EXAMPLE 2 The polynomial * — 2 is irreducible in Q(z]. For otherwise we would. 
have x? — 2 = f(x) - g(x), with deg f > 0 and deg g > 0. The only possibility is 
then deg f = deg g = 1, and so we have, say, 2? — 2 = (az + b)(ex + d), where 


142 Polynomials Ch. 6, See. 3 


a, b, ¢, d are rational numbers. We have ae = 1, and soa #0. By Theorem 2.4 
(Corollary 3) the equation remains true if we replace x by —b,@ (or any other 
element of Q(x}). We get then (b/e)? — 2 = 0, which is impossible, as we have 
seen in Sec. 1, Chap. 4. The polynomial x? — 2 is however reducible in R[x], for 
x —2 = (x — V2)(@ + V2), and both factors are in R[x]. The polynomial 
2? + 1 is irreducible in R{z] (similar argument) but is reducible in C[z], namely, 
2+ = (e+ ie - 4). 

Irreducible polynomials are analogous to prime numbers. But while it is rela- 
tively easy to determine whether a given integer is prime, it is often very difficult 
to decide whether a given polynomial is irreducible. 

From the remarks following Definition 3.2 we see that the g.c.d. of two nonzero 
elements of K[z] is unique apart from an arbitrary nonzero factor in K. (We shall 
show below that a g.c.d. always exists.) Then if h(x) is a g.ed. of f(x) and g(x), 
so is ch(x) for any nonzero element ¢ of K, and all g.e.d. of f(x) and g(x) are of 
this form. If 6 is the highest coefficient of h(x), then b—'- A(x) is a monie poly- 
nomial. We can assume that h(x) is already monic, and it is then unique. We 
call it the greatest common divisor of f and g, denoted by (f, g). Thus, 


3a (f, 9) = the unique monic g.c.d. of f(x) and g(x) 


for any two nonzero polynomials, In particular, (f, 9) = 1 if and only if f and 
g are relatively prime, by Definition 3.3. 


THEOREM 3.1 Any polynomial f(x) in K[zx] of positive degree is divisible by an ir- 
reducible polynomial in K(z]. 
Proof. Suppose the assertion is false, and let f(x) be a polynomial of 
least degree n > 0 which is not divisible by an irreducible polynomial. 
Then f(z) cannot itself be irreducible because f divides f. Hence f(x) = 
g(x) + h(x), where g and h are polynomials of positive degree in K[z], 
But then deg g = deg f — deg h <n, and so g(x) is divisible by an ir- 
reducible polynomial p(x), by definition of n. Clearly p(x) also divides 
f(x), 2 contradiction. Q.£.D. 


We recall that the key tool of Sec. 8, Chap. 2, was the division algorithm. A 
similar theorem holds for polynomials, and it has analogous consequences. The 
theorem simply asserts the possibility of “long division.” Recall that K is sup- 
posed to be a field, so that every nonzero element ¢ of K has an inverse ¢—', 


PROPOSITION 3.2 (Division algorithm) Let f(x) and g(z) be two nonzero elements of 
Kz]. Then there exist two polynomials q(x) and r(x) in K(z] such that 


32 f(z) = a(x) g(x) + rl) and either r(x) = 0 or else 
deg r(x) < deg g(x) 


Moreover q(x) and r(z) are uniquely determined by these conditions. 


Factorization of polynomials 148 


Proof. The uniqueness of ¢ and r is very easily shown, For let f(x) = 
q' (2) - g(x) + 1’(z), where either r’ = 0 or else deg r’ < deg g. Then we 
have q(z) - g(x) + re) = g'(z) + g(x) + 7/(x), whence (a(x) — 9’(x)). g(x) 
= (2) — r(x). Fe q ¥q', then deg (g— 4g’) > 0. By Theorem 2.4 
the degree of the left-hand side is >deg g, while the degree of the right- 
hand side is < deg g, a contradiction. Therefore, q’(z) = q(x), and so 
re) = r(x). 

To prove the existence of g and r we use induction on n = deg f(z). 
The process amounts to long division. If deg f = 0, then the assertion is 
clearly true. For if deg g > 0, then we just take q(x) = O and r(x) = f(a). 
Tf deg g = 0, then f and g are simply two elements of K, and we take then 
r =Oand q = f/g. 

Suppose then that the assertion holds if deg f < x. We show that it 
must hold then if deg f=" +1. Wehave f(r) =ao+ar+-++ + 
Qayit"*!, where diy: #0, and, say, g(x) = bo +o 4 + + bnx, 
where b, #0. Ifn +1 <m, then we put q(x) = 0, r(z) = f(z) and we 
are done. If +1 > m, then the polynomial 


fost 


f(z) ~ 


sation. g(x) 


has degree < n clearly, and therefore by assumption it can be written in 
the form q(x) - g(x) + r(x), where either r = 0 or else deg r < deg g. 
We have then 


fle) = (ne) +o grein), ae) + rz) 


and so the assertion that q(x), r(x) exist holds for n + 1. It follows by 
induction (Sec. 7A, Chap. 2) that the assertion is true for all n. Q.E.D- 


The polynomial g(x) of the theorem is called the quotient of f(x) by g(x); r(x) is 
called the remainder. Clearly g(x) divides f(x) (Definition 8.1) if and only if 
r(x) = 0. Observe that if g(x) is monic, then 5, above is 1 and so it is not neces- 
sary to assume the existence of inverses (for multiplication), That is, the proof 
works in this case if K is any integral domain. 


THEOREM 3.3 (Euclidean algorithm) Let f(x) and g(x) be two nonzero polynomials 
in K[zl. Then f(x) and g(x) have a greatest common divisor in K[x], and it can be 
obtained by the following processt of successive division: write ao for one of the two 
given polynomials and a, for the other. Using the division algorithm divide ao by ax, 
gelting do = qua, + a2, where gy is the quotient and ay is the remainder. If ar 0, 
then divide a, by ax, gelting a = qrax + aa, where qy is the quotient and as the re- 
mainder, ete, Then the last nonzero remainder so obtained is a g.c.d. of f(z) and g(x). 


+The g.c.d. obtained in this way will not usually be monic. 


144 Polynomials Ch. 6, See. 8 


Proof. The argument is essentially the same as for integers. The suc- 
cessive divisions produce a series of equations 


Qo = Gray + a 
ody + a3 
A, = Gas + ay 


Gere + Oy 
Ge = Gedy +O 


Here we have assumed that a, is the last nonzero remainder. Then 
Qs, Ga)... , Ge, Ge are all nonzero, and from Proposition 3.2 we have 
deg a, > deg ap > deg ag > - ++ > deg as) > deg ax. [From this it 
is easy to see that the process (3.8) must terminate after a number of steps 
at most equal to deg a;.] From the last equation we see that a, divides 
a, From the equation immediately above, a; also divides a,_2. Con- 
tinuing up the list (an induction proof is really involved here) we con- 
clude that a, divides a, and ao, that is, f(x) and g(x). Now suppose that 
h(x) divides both f(a) and g(x), that is, a) and a;. From the first equation 
of (2.3) it follows that h(x) must divide a2; hence, from the second equa- 
tion, h(x) divides as. Continuing down the list, we conclude that h(x) 
divides a;. Therefore a, satisfies Definition 3.3. Q.E.D. 


examene 3 Find the g.cd. of f(z) = 82° + 2x 41 and g(x) =a —z42 in 
Q[z]. The calculations (8.3) take the form 


8x2 + Qe +1 = 3- (a? ~2 42) + (br — 5) 
wo c4t2 = liz. (br —5) $2 


a 


be ~ 5 = (gx ~5¢)-2 +0 


Hence the last nonvanishing remainder is 2, and this is a g.ed. of fandg. The 
monic polynomial obtained by dividing the polynomial 2 by its highest coefficient, 
namely 2, is 1. Hence (8z? + 2x + 1, 2? — z+ 2) = 1 according to (8.1), and 
the two polynomials are relatively prime. 


exampte 4 Find the g.c.d. of f(x) = 2° — 1 and g(x) = xt — 23428 +2 -3 
in Q[z]. The calculations (3.3) are 
= (+ 1)- (et — at 4 Qet + @ — 8) + (—29 — Bx? + Be + 2) 


xt — a8 4 Oe? te — 8B = (—e 44)- (2! — Be 4+ Be +2) 
+ (162? — Be — 11) 


. 1 2 5p 
We - Gat $n $2 = he +i) (6x* — 52 DS 

+ a6R (GD 
ex? — 52 ~ 11 = 96 -(62 +11) - @~)+0 


ar 


Factorization of polynomials 145 


Hence 71/(16)? - (x — 1) is a g.c.d. of f and g, and therefore so is x ~ 1, which is 
monic. We have then (f,g) = x — 1. 


REMARK. The calculations involved in finding the g.c.d. can often be shortened 
by discarding constant factors along the way. Doing that will only alter the end 
result by a constant factor which does not matter anyway. For example, the sec- 
ond remainder 162? — 52 — 11 above could have been replaced by x? — 5(gx — 
1tjg, which would have avoided some of the denominators in the third equation. 
Or equally well the first remainder, where it appears in the third equation, could 
be multiplied by 16, ete. 


From Theorem 3.3 we derive the following very important consequence: 


THEOREM 3.4 Let f(x) and g(x) be two nonzero polynomials in K[x], and let h(x) be 
their g.c.d. Then there exist polynomials r(x) and s(z) in Kx] such that 


a h(x) = r(x) - fe) + s(2) - g(@) 


Proof. From (3.8) we have a, = @¢-2 — qs-14:_1. From the (unwrit- 
ten) equation immediately proceeding we get a: 1 = @s_s — Qy—2@k—2) 
whence ay = a2 — Gu—a(@y—a — Gaede) = (1 ee 2) 4-2 — Ge 1k 


Continuing up the list (3.8) we can successively get rid of a;_,, ete., until 
we finally arrive at a = pao + q+ a1, where p and q are certain poly- 
nomials. Now a, can differ from h(x) by only a constant factor, say 
a, = e-h(x). Then we get h(x} = (p/ejaot (g/elar = (p/e)f(z) + 
(q/e)g(x). QED. 


There are infinitely many pairs of polynomials r(x) and s(x) whieh satisfy (8.4). 
exampce s From Example 8 we have 


2 = @ — 242) ~Jyn- (Ge - 5) 
= @ —2 +2) — Ign [(B2 4+ 2x +1) ~3-@? ~2 4+ 2)) 
= (86x +1): @t ~ 2 + 2) — Mga Bx" + 2x + 1) 


Hence 

1 = F@), 9@)) = @fox + 14)- 9) ~ Mor S@) 
Here 

r@@) = -How and a(x) = Box + 14 


EXAMPLE 6 The first three equations of Example 4 have the form ao = qa, + 4, 
a = Gote +43, and a: = quas + as, where a, = 71/(16)? + ( — 1). Then 
303 = a2 ~ quar — Getz) = (1 + gegs) » ae — qadi = (1 + gags) + (ao — 
101) — quar = (L + Gags) © @o — (0 +9290) - Gi + Qs] a. = Now que gs = 
(-2 +4) - (Yo) + @ + F8fg) = 1/16)? - (162? ~ Le ~ 212). Carrying 
out the calculations we get (16)? - a = (16x? — Ix + 44) - f(x) ~ (162% + 


as = 


146 Polynomials Ch. 6, Sec. 8 


Sat + 17x —9) - g(x). Then x —1 = rfc) + f(r) + r(e) - g(x), with riz) = 
41 - (16x? ~ Lx + 44) and s() = — 14, - (162 + 52? + 17" — 9). 

The close analogy between the foregoing and Sec. 8, Chap. 2, should be ap- 
parent. Here we use the degree of a polynomial as a kind of measure of its “size.” 
The proofs of the following two theorems are almost the same as the proofs of 
Theorems 8.8 and 8.9, Chap. 2, and are left as exercises. 


rHeorem 3.5 Let f(x) and g(x) be two nonzero elements of Kix}, and let p(x) be an 
irreducible polynomial in K(x]. If p(x) divides f(x) - g(a), then it must divide either 
F(z) oF g(z). 


conoutary If the irreducible polynomial divides a product fi(z)flz) - + + faz) 
of nonzero elements of Kz], then it must divide one of the factors. 


THEOREM 3.6 Lei f(x) be a polynomial of degree greater than zero in K(x]. Then 
I(x) can be expressed as a product 


36 f(x) = c+ pi(w)pe(e) + - + pr(e) 


where c is an element of K and where p(x), . . . , pr(x) are irreducible monic poly- 
nomials in K[z]. Furthermore the expression is unique apart from ihe order of the 
factors. Finally, any irreducible monte polynomial in K\z| which divides f(x) must 
coincide with one of the pi(z). 

The expression on the right of (3.5) is called the prime decomposition, or fac- 
torization, of f(x). Some of the factors m, . .. , p, may be repeated, of course, 
and by colleeting such factors we can write f(x) in the form 


f(x) = c+ pilz)"pr(ay? « - - pe(a)"* 
where now 2, pa.» + , De are all different, the exponents being positive integers. 
Sometimes it is convenient to allow zero exponents also. For example, let 
pilz), pale), ». +, Pale) be all the irreducible monic polynomials appearing in 


the prime decompositions of two polynomials f(x), g(x) in K[x} (of degree greater 
than zero). Then we can write 


Sa) = e+ pyle)"*pale)"t » - » pale)’ 
and , . , 
g(r) = ef pia)"tpa(e)t + © + pale)" 


where now some of the exponents may be zero. 


EXERCISES 
1. Give complete proofs of Theorems 3.5, 3.6 and of the corollary to Theorem 3,5. 
2. Let f(x) and g(x) be two nonzero elements of K(x}. Let B denote the set of 
all elements of the form a(x) + f(z) + d(x) - g(x), where a(x) and 5(z) are arbi- 
trary polynomials in K[z]. Show that B is a subring of K[x], and prove that if 


Factorization of polynomials U7 


v(z) is in B, then so is u(x) - (x) for any u(x) in K[z]. Show that B contains a 
unique monic polynomial A(z) of lowest degree, and prove that h(z) = (f(x), 
g(z)). (Hint: Apply the division algorithm to A(z) and f(z).] 
3. Find the g.c.d. of the following polynomials in Q{z]: 
(a) 2 42x —2,2%—1 
(b) 23 + x? ~ 2x, a? 4+ 22? ~ 2 ~2 
©) vt ~xttr-1, 24a -2, ot + 82 ~ 822 +82 ~4 
For (a) and (6) find polynomials r(z) and s(z) satisfying (3.4). 


4. If fi, fy. - , fa are nonzero elements of K(x], prove that they have a g.c.d. 
h(x) in K[z] and show that A(z) is unique if it is required to be monie. Give a 
method for computing h(a), and prove that there are elements 8), 92, .. . , 3, im 
K{z] such that kh = sify + sof2 + ++ + + 8h 


5. Let h be the g.c.d. of two polynomials f, g of positive degree in K[z]. Show 
how to obtain the prime factorization of h from the prime factorizations of f and g. 

6. With f, g, and k as in Example 5, show that hk” = (f*, g"). 

7. Let a, 6, f, g be elements of an integral domain D, all nonzero, and let 
af +g =1. Prove that (f, 9) = 1 and (@, 6) =1. 

*8. Let D be an integral domain, and let there be assigned to every nonzero ele- 

ment c of D an integer, denoted by d(c), such that 

(a) d{c) > O for any ¢ ¥ 0 

(b) d(ab) = d(a) + d(6) for any a, 6 # Oin D 

(c) d(a +) < max {d(a), d(6)} for any a, b #0 
Prove that d(1) = 0 and that ife in Dis invertible, then d(c) = 0. Suppose further 
that if a, b are any two nonzero elements of D, then there exist ¢ and r in D such 
that a = gb +7 and either r = 0 or d(r) < d(b). (D is then called a Euclidean 
ring.) Prove that if d(c) = 0, then c is invertible. Prove that every element a of 
D with d(a) > 0 can be expressed as a product a = pip2 + - - p, of prime elements 
in D, this expression being unique apart from invertible factors and the order. 

9. Let K[z] be a polynomial ring over a field K. Prove that K[z] contains in- 
finitely many monic irreducible polynomials. (This is trivial if K has infinitely 
many elements, for every polynomial + + a of degree 1 is irreducible and monic.) 
Conclude that if K isa finite field, then K[z] contains irreducible monic polynomials 
of arbitrarily large degree. (Hint: See Theorem 8.3, Chap. 2.] _ 

10. Find the prime factorizations of the following polynomials: 


(a) x? — 1 in Q[z] (b) 2 4 1 in C[x] 
(e) ax? + br +c in Cle] (d) 2% + 1 in Qf} 
@) ®@+2+1in Ce] (f) 2 + ax ~ 3 in Rizl 


11. Determine all irreducible polynomials of degree 2 and 3 in Z,{z]. 

12, Determine all irreducible monic polynomials of degree 2 in Zalz]. 

13. Prove that the invertible elements in an integral domain form a group with 
multiplication as the binary operation. 


148 Polynomials Ch. 6, Sec. & 
4. Roots of polynomials 


Let f(z) be an clement of a polynomial ring K{z], where K is a field. Then f(z) 
can be expressed in the form 


f@) = a9 tae t+ + + aut" 


and this expression is unique if a, 0. Now let ¢ be an element of K (or of some 
larger ring containing K as a subring). Then we recall that f(c) stands for 


fe) =a tae + +++ +a,c" 


If deg f(z) = 0, that is, if f(z) = ao, then f(c) = a» for any c. For this reason we 
sometimes refer to polynomials of degree zero, that is, elements of K, as constants. 
If deg f(x) > 0, then of course f(c) depends upon c. 


ton aa Let f(x) be a nonzero element of the polynomial ring Klz], and let ¢ 
be an element of K. Then c is called a root (or a zero) of f(x) if f(e) = 0. 

Obviously f(z) cannot have any roots if deg f = 0. The following theorem is 
fundamental: 


THeoreM 41 Let f(x) be a nonzero element of K(x}, and let ¢ be a root of f(x) in K, 
Then x ~ ¢ divides f(z). 
Proof. By the division algorithm (Proposition 3.2) there exist poly- 
nomials g(a) and r(x) in K[z] such that fiz) = g(x) -(@ — ¢) + r(e), 
where either r(x) = 0 or deg r < deg (x — c) = 1, Hence ris an element 
of K. Now the equation remains true if we substitute ¢ for x (Theorem 2.4, 
Corollary 8), and it becomes 0 = r(c). Therefore r = 0. QED. 


corottary If ci, ¢2, . . . , x are distinct roots of f(x) in K, then f(x) is divisible 
by (2 —e1)(z — 2)» ++ (2 — ex). Therefore the number of distinct roots of f(x) 
cannot exceed deg f. 

All the irreducible polynomials x — ¢,... , 2 — ey divide f(x), by the theo- 


rem, and the corollary follows at once from Theorem 3.6 and Proposition 2.3. It 
can also be proved directly, For let q(z) be the quotient of f(z) by z — ey Then 
Sle) = q(x) (e — e), and this equation remains true if we replace x by cx, . «+ 
ex. Thus 0 = g(c)) « (e; — ¢1), and by assumption ¢; — e1 ~ Oforj =2,..., & 
Hence ¢s, cs, . . . , ¢, are distinet roots of (x), and the argument can be repeated 
with q(z) in place of f(z). An induction is involved. We omit the details. 


THEOREM 4.2 Lei f(x) be a polynomial of degree n > 0 in C(x]. Then f(x) can be 
factored as a product 
aa f(x) = ale ~ ene ~ 02) + + (2 ~ en) 


where 4, ¢, Cz... 5 x are complex numbers. In fact, a is the highest coefficient 
of fz). Moreover (4.1) is unique apart from the order of the factors. The irreducible 
polynomials in C{z] are the polynomials of degree 1. 


Roots of polynomials 149 


Proof. The uniqueness of (4.1) follows from Theorem 3.6, and it is 
obvious that a is the coefficient of x". To prove the existence of such a 
factorization we use induction. For n = 1 the assertion is trivial. Sup- 
pose-then that it is true for m — 1 (where m > 1), We show that it must 
be true for n. In fact, f(x) has at least one root ¢, in C, by Theorem 1.4, 
Chap. 5. By Theorem 4.1, x — ¢; divides f(z), say, f(z) = q(x) - (2 — 1). 
Then deg g(x) = » — 1, and by assumption @(«) can be factored in the 
form (4.1), and therefore so can f(x). Q.E.D. 


Naturally some of the factors x ~ cin (4.1) may appear several times. If x —¢ 
appears exactly k times, then ¢ is said to be a root of f(x) of multiplicity k, or to be 
a k-fold root of f(x). 


corottary Let f(x) be a polynomial of degree n > 0 in R[x]. Then f(z) can be 
expressed as @ product 


42 f(z) = a+ pileypole) + - + Bela) 


where pilz), . . « , pr(&) are monte irreducible polynomials in R(x] of degree 1 or 2 
and where a is the highest coefficient of f(x). The factorization (4.2) is unique apart 
from the order of the factors. In particular, the irreducible polynomials in R[x] are 
those of degree 1 and certain polynomials of degree 2. 

Proof. Uniqueness of (4.2) follows from Theorem 3.6. We use induc- 
tion on # to prove the existence of such a factorization. For » = 1 this 
is trivial. Now let » be an integer < 1, and suppose that a factorization 
of the stated type is possible for all polynomials in R[x] of degree < x. 
We show that the same is true for degree n. Let f(x) have degree n and 
factor f(z) in the form (4.1). If all the roots cy, es, ..., ca are real, 
then we are done. Otherwise let c,, say, be a root which is not real. By 
Theorem 1.3, Chap. 5, its complex conjugate é; is also a root. Hence & 
must appear among the numbers ¢2, ¢3, . . . , ¢,, and so the right-hand 
side of (4.1) must contain the produet (x — e\)(@ — 4). Call this poly- 
nomial p(x). Then p(x) =<? — (. + a)z + cié, and the coefficients 
here are real. Hence p(x) is in R(x], and it is plainly irreducible in R[z]. 
From (4.1) it is clear that p(x) divides f(x). That is, f(z) = p(z) + ¢(2), 
where q(x) is the product of a and the # ~ 2 other factors different from 
x ~e,and 2 ~é, The quotient g(x) must have real coefficients. Apply 
the division algorithm to f(x) and p(x) in R[x], getting say f(z) = 
plx)oi(z) + r(x), where g: and r have real coefficients, and either r = 0 
or deg r < deg q1. From the uniqueness part of Proposition 3.2 it follows 
that q(x) = g(x) and r(x) = 0. Thus g(z) is a polynomial in R[x] of 
degreen — 2. Ifm = 2, then it is a constant (namely, 2), and we are done. 
If n > 2, then by assumption q(x) has a factorization of the form (4.2), 


150 Polynomials Ch. 6, See. 4 


and therefore so does f(z) = p(x) +(x). Hence the theorem holds for all 
n > 0, by mathematical induction. Q.E.D. 


The corollary shows that complex roots of a real polyromial must occur in con- 
jugate pairs. If the polynomial has odd degree, it must then have at least one real root, 


exampte 1 The theorems above are often useful in determining prime factoriza- 
tions of polynomials. Let us find the prime factorization of 2! + 1 in R{z}. This is 
not entirely trivial, and in fact it gave some trouble to as eminent a mathematician 
as G. W. Leibniz (1646-1716), one of the discoverers of calculus. Now the roots of 
the polynomial x! +1 are just the 4th roots of ~1. According to Theorem 3.2, 
Chap. 5, those roots in © ean be found by writing —1 in polar form: —1 = eos x + 
isin x. The 4th roots are then 
cos 2A PET i gin 7 EB (k = 0,1, 2,8) 

These are easily seen to be the four numbers 14(41 4 4). Therefore (using the 
corollary of Theorem 4.1) we get 


aed er aoe Gare are 


= (ct ~ V8r + 1a? + vB8x +1) 


The first expression above is the prime factorization of «* + 1 in C{x]; the second is 

its prime factorization in R[x]. The polynomial x‘ + 1 is irreducible in Q[z], as is 

easily seen. The factorization above can also be obtained from the not entirely 

obvious observation that xt + 1 = (? + 1)? -227 = @? 414 V22)@? +1 - 
2x). 

Proof of Wilson's Theorem. As another application we now give a proof 

of Theorem 10.6, Chap. 2. We wish to show that if p is prime, then 

(p — 1)! =—1 (mod p). To do so we take for K the field Z, of residue 

classes of integers modulo p (Sec. 10, Chap. 2). Since the theorem is easily 

verified for p = 2, we shall assume that p > 2; hence p is odd. Now let é 

denote any nonzero element of Z,. (Remember, é is a residue class of 

integers modulo p.) From Fermat’s theorem (Theorem 10.5, Chap. 2) it 

follows at once that z?-' = I, where I is the unit element of Z,. Hence 

every nonzero element of Z, is a root of the polynomial 2°! ~ 1 in Z,[z]. 

Letting 1,2, .. . , p — 1 denote the residue classes modulo p containing 

1,2, ...,p — 1, respectively, we have from the corollary of Theorem 4.1 


w'-~T=@-@-2).--@-~p-t 


since x?-! — I must be divisible by the right-hand side, and both sides 
have the same degree. Now put 6 for x in this equation (6 being the zero 
element of Z,). We then get ~I = (~1)(-2)... (--—b) = 


Roots of polynomials 151 


wr 


(-1P71-1-2+-. (= 1) =1-2--- @ -1),sincep — Liseven. This 
equation means precisely that —1 = (p — 1)! (mod p). 


‘We mention another important theorem. It is a generalization of Theorem 3.3, 
Chap. 5. 


THEOREM 43 Let h be an isomorphism from a field K to a field K’, and let x, y be 
variable quantities over K and K’, respectively. Let f(x) and g(x) be two polynomials 
in K(x], and let f’(y), 9’ (y) be the polynomials in K[y] obtained by replacing x by y 
and by replacing each coefficient a by the corresponding element h(a) in K’. If g(x) 
divides f(x), then also g’(y) divides f’(y). 
Proof. Consider the mapping h* from K[z] to K’[y] defined as follows: 
Tao + ae +--+ + 4,2” is any element of K[z], then by definition it is 
sent by k* into a + ayy + --- + any", where a} = k(a,). Clearly for 
elements of K the mapping h* is the same as the given mapping h; and the 
polynomials f(y), g’(y) of the theorem are the result of applying k* to f(x) 
and g(x). It is a very straightforward matter to verify that h* is an izo- 
morphism from K[x] to K’[y}. UW f(x) = g(x) + q(z), and if we let q’(y} de- 
note the result of applying k* to q(x), then f’(y) = g’(y) - q’(y), since h* is 
an isomorphism, and so g’(y) divides f(y). Q.E.D. 


corotLary [fc is @ root of f(x) in K, then c’ = h(c) ts a root of f’(y) in K’. 
Proof. By Theorem 4.1, « — c divides f(x). Hence, by Theorem 4.3, 
y —c’ divides fy). QED. 


exampte 2 In particular, if we take K = K’ = C, x = y, and for h the isomor- 
phism that sends every element of C into its complex conjugate, and if ¢ is a root of 
a polynomial f(x) with real coefficients, then from the corollary it follows that 
2 is also a root of f(x). This is Theorem 3.3, Chap. 5. 


EXERCISES 

1. Find the prime factorizations of the following polynomials in Q{z], R[z], and 
Cle): 

(@) 2-2 ) 2 + 42? + Be 42 
(e) x3 4 22? ~1 (d) 422? +22 41 

2, Let c be an element of a field K. Let S, be the mapping K[x] > K that sends 
any element f(z) in K[x] into f(c), the element of K obtained by substituting ¢ for z. 
Let A denote the subset of K(x] consisting of all elements which are sent by 5. into 
zero. Prove that A is a subring of K[z]. Give a simple criterion (not the one al- 
ready given) tor a polynomial to be in A. 

3. Show that if ¢ is any real number, then x” ~ c? divides 22° ~ e for any posi- 
tive integer n. Find the prime factorization in R[x] of 22” —¢. [Hint: Use the 
method of Example 1] 

4, Let f(z) and g(x) be polynomials in K[z], and suppose that they have a root ¢ 


152 Polynomials Ch. 6, See. & 


in common in some field L containing K asa subfield. Prove that f(x) and g(x) then 
have a g.c.d. of positive degree in K[z]. 

5. Consider the mapping D of a polynomial ring K[z] to itself defined as follows: 

Hf fe) = ay + aye + ax? +--+ fa,2", then Df(x) = a, + Qasr + Base? + 

+ +na,e""!, (Df(z) is called the derivative of f(x).] Prove that D(f(a) + 
g(2)) = Df(z) + Dg(x) and D(f(x) + g(x)) = Df(z)- g(@) + fle) Dg(x). Prove 
that an element f(x) of C[z] has a repeated root if and only if f(z) and Df(x) are not 
relatively prime. 

*6. Let K be a subfield of a field L, and let ¢ be an element of L which is a root of 
an irreducible polynomial f(z) of degree n in K|x]. Prove that every element of 
K[e] can be expressed in one and only one way in the form a9 + aie + +++ + 
a,c", where dy, @1, . . . , @n—1 are in K (see Definition 2.1). Prove that K[clisa 
field. (Hint: If g(z) is a nonzero polynomial of degree <n, show that there exist 
polynomials r(x) and s(x) in K[z] auch that r(x) - g(x) + s(x) - f(z) = L] Ife’ is 
another root of f(z) in L, prove that K[e] and K[e’] are isomorphic. 

7. Let K be a subfield of a field L, and let ¢ be an element of L which is algebraic 
over K (see Definition 2.2). Let A be the set of all polynomials in K[x] having e asa 
root. Prove that A contains a unique monic polynomial f(a) of lowest degree and 
that it divides every element of A. Prove that if c’ is another element of L which is 
algebraic over K and that if A’ is the similarly defined subset of K(x], then A = A’ if 
and only e’ is also a root of f(x). 

8. Let B be the set of all polynomials of degrees 0 and 1 in R[x). The sum of two 
elements of B is clearly again in B. Define a product » in B as follows: If f(z) and 
g(x) are two elements of B, then f(z) « g(x) = remainder of f(x) - g(x) upon division 
by a? + 1. The remainder must have degree < 2, and so f(x) + g(x) is again in B. 
Prove that B with this product and its usual addition is a field of complex numbers 
(Definition 1.1, Chap. 5). 

*9. Let K be a field, and let p(x) be an irreducible polynomial of degree » in 
Kz]. Let B consist of all polynomials in K[x] of degree <n. For two elements 
f(z), g(a) in B define a produet * by f(x) + g(x) = remainder of f(x) - g(x) upon 
division by p(x). Prove that B with its usual addition and with the product + is a 
field containing K as a subfield and containing a root of p(x). 


10. Let f(z) = 2" + at) +--+ + a,12 +4, be a polynomial with co- 
efficients in Z. Prove that if f(x) has a rational root r, then r must be an integer. 
Conclude then that the congruence x" + az""! + + ++ +4, = 0 (mod m) has a 


solution in Z for every integer m > 1. From this prove that x3 ~ 8z + 6 cannot 
have any rational roots (try m = 5; you only have to test five values of z). Show 
that 4z? + 2x — 3 has no rational roots (multiply the equation by 2 and replace x 
by y = 2z). 

11. Prove that x? + x + 1 divides x + 2+! + x*+? for any non-negative in- 
tegers a,5,e. Can you find a generalization of this fact where the integer 3 is re- 
placed by an arbitrary one? 


Polynomials in several variables 153 


12. Find the prime factorization of (7 + 1)" + (x — 1)” in R[z]. 

13. The polynomial x4 — x3 — 22° + 4x — 4 has the root 1+ 7%. Find the 
other three roots in C. 

14. Let ao, a1, . . . , &, be distinct elements of a field K, and let bo, b1, On 
be arbitrary elements of K. Prove that there is one and only one polynomial f(z) of 
degree x with coefficients in K such that f(a;) = b;for? =1,...,. Prove that 
{(z) is given by the formula 


(e = do) + + + = Gj) @ ~ jt) + (@ = Ge) 


$0) = YO gag) Ca NG, Gai) TD 


This is called Lagrange’s interpolation formula. Find the polynomial f of degree 3 
in Q{e] such that (1) = 1, f(2) = 8, f(8) = 6, f(4) = 10. 

*15. For any sequence {a;|] = {a0 a1, a2, as, . . .| of rational numbers, denote by 
Ala;} the sequence of differences {a; ~ ao, a: — a, a; — a, ...|. Define A" 
inductively by A"fa,} = A(A"~1{ai}). Thus A*}a,{ is the sequence {a; — 2a) + ao, 
as — 2a2 + a, a ~— 2a; + a, .. .]. Now let f(x) be a polynomial of degree n 
with rational coefficients, and set a, = f(i) for i = 0, 1, 2,.... Prove that 
A’ja;} =O forr >. Conversely, if {a,} is a sequence of rational numbers such 
that A"+'ta;] = 0, prove that there is a polynomial f(z) of degree at most n, with 
rational coefficients, such that @; = f(i) for? = 0,1,2,.... 

16. Using the results of the preceding exercise, prove that there is a unique poly- 
nomial f(r) of degree r + 1, with rational coefficients, such that f(z) = 1" + 2° + 
S +--+ $n forn =1,2,3,.... 


5. Polynomials in several variables 


We begin with some rather obvious generalizations of the definitions of Sec. 2; but 
for simplicity we shall restriet our considerations to integral domains, rather than 
arbitrary rings. Then let D be an integral domain, and let 4,2. . . . . & be ele- 
ments of D. Any (finite) sum of terms of the type 


ERY Cas en ainD;m,..., k, integers > 0 


is called a polynomial in t1, to, , tn, with coefficients in D. A term of the type 


(5.1) is called @ monomial; the integer k1 + ko + + - - + ky is its degree. 

PROPOSITION 5.1 Let D be a subring of an integral domain E,and let hts, .. . , ts 
be elements of E. Let Dit, tz, - . . , fs] denote the set of all elements in E which can be 
expressed as polynomials in ty, . . . , t, with coefficients in D. Then Dtuty . . « tn} 
is a subring of E, and it is the smallest subring containing D and ty, ty... be 
Furthermore, if k <n and if D’ = Dit, i, .. 4 tel, then Dik, ty... b) = 


Dleesay oy bale 
The proof of the first part is essentially the same as the proof of Proposition 2.1 
and is omitted. The last part simply says that a polynomial in fy, . . . , ts (with 


154 Polynomials Ch. 6, See. 5 


coefficients in D) can be considered as a polynomial in the last n — k elements 
teat, » +» »t, with coefficients which are polynomials inf, . . . , &, and vice versa. 
That is obvious. 


DEFINITION 5.1 Let D be a subring of an integral domain E, and let 21, 2, . . - , 2m 
be elements of E. Then they are said to be independent variables (or independent in- 
determinates) over D if no polynomial in x1, m2, .. . ,%_ with coefficients in D is 
zero unless all the coefficients are zero. In that case, Diy, x2, . . - , tn) ts called the 
ring of polynomials in 2, .. . , 2, with coefficients in D. 

In other words, every element of Diz, 2, . . . , Zn] can be expressed uniquely, 
apart from terms with zero coefficients, as a polynomial in x1... , tn with co- 
efficients in D. That is, two elements of Dia, . . . , &»] are equal if and only if 
every monomial z\*z"* - - » a,'* has the same coefficient in both elements. The 
degree of a polynomial is the greatest of the degrees of the monomials that occur in 
it with nonzero coefficients. 

For brevity we shall say that elements x, x», . ,2, are independent variables 
over an integral domain D if they are elements in some integral domain containing 
D as a subring and if they satisfy the conditions of Definition 5.1. The ring 
Dix, . . . , ta) is called the ring of polynomials in a, .. . , tn over D. 

Corresponding to Theorem 2.4 we have the following result: 


THeorem 5.2 Lei D be an integral domain and lei x1, 22, ...» fn be independ- 
ent variables over D. Let fy, tx, . - « , tr be arbitrary elements of any integral do- 
main containing D as a subring. Then the operation that sends every polynomial 
Say, ty, ~~... tn) of Dix, . . ~ , xn] into the element f(t, by... , ty) obtained 
by replacing each x; by t; GG=1,..., n) i8 @ ring-homomorphism from 
Din, ..., xn} te ity... , be 

The operation in question defines a mapping from D{z, ..., to] to Dit, 
. . « sty) since every element of Dizi, . . . ,2,] can be written as a sum of distinet 
monomials a - x; - - » 2, (a2 in D), and that representation is unique except for 
terms with coefficient zero. Therefore the element f(t, . . . , tn) is uniquely deter- 
mined by the element f(z, . . . , Z,) and does not depend upon the particular way 
in which the latter is written. The fact that the mapping so defined is a ring- 
homomorphism follows from the same argument as in the one-variable case of 
Theorem 2.4. We omit the details. 


coromany If ti, ..., is are also independent variables over D, then the mapping 
Diz... . , 2] Dit, . . - , of Theorem 5.2 is an isomorphism. 
For then the mapping is clearly one-to-one, by Definition 5.1. 


THEOREM 5.3 Let D be an integral domain. Then there exists an integral domain E 
containing D as a subring and containing n independent variables my, m2, .. - , En 
over D, where n is a given positive integer. 


Polynomials in several variables 155 


By Theorem 2.5 there is an integra] domain Z; containing D as a subring and 
containing an independent variable z, over D. By the same theorem there is an 
integral domain E: containing E, as a subring and containing an independent 
variable x, over H;. Then E; also contains D as a subring, and clearly x, and 2» are 
independent variables over D. Continuing in this way one easily proves Theorem 
5.3 by induction. 


The theorem guarantees the existence of a polynomial ring D(z, 2, .. . , tal 
in any number of independent variables. 

Now let s denote a permutation of the integers 1,2, ...,”. That is, sis a one- 
to-one mapping of that set to itself. Then rao, tao. - + Zn) are the elements 
a1, %... , 2, in some other order (unless s is the identity permutation, of 
course). If in the corollary to Theorem 5.2 we take ty = zn, .- «yt = Zane 
then it follows that the mapping from D{w1, . . . , 22] to itself obtained by applying 
the permutation s to 4, @, . . . , 2, is an isomorphism. 


EXAMPLE 1 Consider the case n = 2, and let the independent variables be x and 
y. Every element f(x,y) of Dix, y] can then be written in the form 


$v) = YF auaty! 
kL20 
where & indicates a finite sum and where the ay are elements of D. The only 
permutation of z and y other than the identity permutation is of course that one 
which interchanges z and y. From what has just been said above, the mapping 
that sends f(z, y) into 


f(y, 2) = Zanyts! 


is an isomorphism from D(z, y] to itself. For example it sends « + 1 into y +1 
and 2° + xy? — y + 2 into y? + yx? — 2* + y, ete. 


DEFINITION §2 Lel Dixy xs, . . . , 24] be a ring of polynomials in x1, . . . , Zn over 
an integral domain D. Then a polynomial fix, xa... . 2») in Dien... , tal #8 
called symmetric If f(t ta... yn) = F@ans Barry, © « «Loon for every permu- 
tation s of 1,2)... 5m 


It is easily seen that the symmetric polynomials form a subring of Dizi... , 
Xn). 


Exampte 2 For two independent variables x and y the polynomials x + y, 2° + y*, 
xy, 2 + 2y + ¥? are symmetric. 

For three independent variables x,, x2, x3 the polynomials (2, — x9)*(x; — x3)? 
(xq — x5)°, Uy02ts, 21 + Fe + xs, Lite + 2123 + Ax, are all symmetric, but (x1 — a) 
(a, — 22)(x2 — 24) is not symmetric, f for the permutation that interchanges 1 and 2 
causes it to change sign, clearly. 


{ Except in the case of a coefficient domain D in which 1 = —1, for example 22. 


156 Polynomials Ch. 6, See. 5 


For any elements ¢,, fs, . .. , f, of an integral domain, we define the symbols 

oi), . . . , ¢a(2) by the formulas 

AQ =hth+s+ + +h = 5h 
82 oft) = hile bite to Ply Plohe Fo tbe = DD by 

a 
oat) = tifals + tite +o = SO biti 
iG 

on(t) = bits + > + tn 
In general, «;(t) consists of the sum of all the different products of & of the elementa 
t, ... , with different indices. The elements o;(t}), . . . , o4{¢) are called the 
elementary symmetric functions of th, fy... , tx It is easily seen that if y is any 


element of the integral domain, then 


53 & — bY — bb) @ b= yt) $i rt 
vee HCH) oO 
Henee, we have the following theorem: 
THEOREM 54 Lei f(x) be a monic polynomial of degree n > 0 in a polynomial ring 
K[z], and suppose that f(x) has 2 roots e, ¢2, .. . ,¢, in K. That is, suppose that 
f(x) factors into 
fe) = (@ — a) — 01) + + @ ~ ex) 


Then the coeficient of x"~* in f(x) is (—1)* + oxfe). 
The converse problem of expressing the roots of f(x) in terms of the coefficients is 
discussed in Sec. 6. 


The following theorem shows that if x, zz, . . . , % are independent variables 
over an integral domain D, then every symmetric polynomial in D[zy, 22, . . » » tu) 
can be expressed as a polynomial in the elementary symmetric funetions o1(2), 
oo(z), .. +, an(x). The proof is omitted. 
tHeorem 5.5 Let Dim, 22, . . . , 22] be a polynomial ring over an integral domain D. 
Then the set of symmetric polynomials in Dias, . . - , %n] coincides with the subring 
Doslz), ox2), .. . sanz) 


exampce 3 For n = 3 the elementary symmetric functions o1(2), ox(2), os(x) are 
ay tte + 2s, ze + tury + vets, and xytex;. The polynomial f(a, 2, 24) = 
ae + ne +a + 2, + a + 2; + 2 is easily seen to be equal to ai(x)? + o1(2) — 
Qoo(x) + 2. 


REMARK, If for any elements f;, fr, . . . , fn. we define 


Polynomials in several variables 157 


at) = 


se sift) = 


then it can be shown that Theorem 5.5 holds with s;(x), s(x), . . . , s(x} in place 
of o1{x), o2(x), . . . , on(x) if Disa field of characteristic zero (cf. Exercise 6, Sec. 2, 
Chap. 3; see also Exercise 8 below). For example, in D[x, 2] we have oi(z) = 
s(x), a2(t) = 34(s1? — &). 


Now let K be a field and consider a polynomial ring K(x), x2, . . . , ea] inn 
variables over K. This is an integral domain, and of course the definitions at the 
beginning of See. 3 apply. It is therefore reasonable to ask whether any analogue 
of Theorem 3.6 concerning unique factorization holds for polynomials in » variables, 
Such a theorem does indeed hold and is stated below. It can be proved fairly easily 
by induction, using Theorem 3.6, but we shall not give the proof here. In the case 
of polynomials in one variable, we were able to avoid annoying constant factors 
(nonzero elements of K) by “normalizing” the irreducible polynomials—i.e., by 
taking them to be monic. There is no very reasonable notion analogous to monte 
for polynomials in more than one variable, and therefore we shall not attempt to 
normalize irreducible polynomials for » > 1. 


THEOREM 5.6 Any nonconstant polynomial f(xy, .. . , tn) in K[ty ty. + Sn] 
can be expressed as a product 


f= PP. + ++ Be 


where Pi, Py... , Pr are irreducible polynomials in K(x, ... . an. The factori- 
zation is unique apart from constant factors and the order of the factors. 


EXERCISES 

1. Let x, y be independent variables over a field K. Prove that x? + y and 
ax + by are irreducible in K(x, y] for any a, 6 in K not both zero. What is the 
g.cd. (a, )? 

2. Prove that if is an odd positive integer not divisible by 3, then xy - (2% + 
ay + y) divides (x + y)" ~ 2” — y” in K[x, g]- 

3. Prove that (x + y)) ~ 27 -— yi = Tzry- (2 Fy +ay FY) 


4. Prove that there are elements X, ¥ in R[x, . om Yay + + + + Yn] Such that 
Ti @e te) =x + ¥* 
isd 
where 21, - + 5m, Yu - + - » ¥n are independent variables. 


5. Let 2? + az + b be a polynomial in C[z], and let r;, rz be its roots. Express 
ry + rand r;* + rs* in terms of a and 6. 


158 Polynomials Ch. 6, See. 6 


6. Taking three independent variables 2;, 22, 22 express the polynomials o,(z), 
ox(z), ou(z) of (6.2) as polynomials in s:(x), 82(«), sa(z) of (8.4), and vice versa. 
Express zi! + x2! + x3 as a polynomial in oy, 0, 7. 

7, Let wi, 2, .. . , ws be the roots of z* = 1 in Clz). Prove that w: + w+ 

+ tw, = 0. 
*8. Prove the “Newton formulas” concerning the symmetric polynomials o: 
and s; in the variables x1, 22, 6... at 
Be — oe Hoedee tos $ (HD os + (- 1h = 0 ES) 


Be oe ts FHL nS = 0 (k >) 
Conclude that Klay, ... , on) = Klsy -. . » 8] provided every element of K is 
divisible by 2, . . . , x. Compare with Theorem 5.5. 

9. Let the roots of f(z) = a" + gat! +++ ++ a,berndy ... yt. Express 
@e + Dee +1) --- G2 +1) im terms of a1, an... , ay. (Hint: Consider 


LOK D4 
6. Polynomials of degree less than 5 


Theorem 5.4 shows that the coefficients of a polynomial can be caleulated very 
easily from the roots. The converse problem of finding a formula which gives the 
roots in terms of the coefficients is much more difficult. Indeed it is unsoleable in 
general by purely algebraic operations. In this section we shall show that such 
formulas do exist for polynomials of degree less than 5. We shall confine our at- 
tention to polynomials with complex coefficients. Since the roots of a polynomial 
are unaltered if we divide the polynomial by its highest coefficient, we can limit 
the discussion to monic polynomials. For polynomials of degree 1 the problem at 
hand is trivial, and we consider first the case of quadratic equations. 


quadratic equations The problem in this case is to solve an equation of the type 
64 w+ bx +e =0 


where 6, ¢ are any complex numbers. The answer of course is the well-known quad- 
ratic formula. It is instructive to carry out the steps, however. If in (6.1) we put 
x = y —b/2, then we get 


2 bY 
v+(e-f)=0 


aa is easily seen. The point here is that we got rid of the term involving the first 
power of the unknown. To solve this last equation we simply have to find two 
square roots of the number 6?/4 — c, and we have seen how to do that in Theorem 
3.3, Chap. 5. Calling one of them V’62/4 — e, then the other is ~V/b?/4 — ¢, and 
the solution is y = +W#/4 —e, ort 

+ The notation here is rather ambiguous: z is understood to be an indeterminate over ¢, 
and therefore no such equation as (6.2) is possible. The “equation” should be interpreted 


as meaning that the numbers on the right are what must be put for z in (6.1) to get an 
equality. 


Polynomials of degree less than o 159 


--§,  vETeR 
6.2 a= gH 3 Ve de 


cusic EouATIONS Now let us take up equations of the type 
63 x art tbr te =0 


Just as in (6.1) we can make the second-degree term disappear by putting x = 
y — a/3. There results a new equation 


ea y+ py tq=0 


in which p and g can easily be expressed in terms of a, b,c. The exact formulas 
are of no importance here. 
To solve (6.4) we put y =u +9. Then 


py =u + 03 + 8uv-(u +2) = uw? + 0 + Bury 
and go (6.4) becomes 
68 wot Bw tp)-ytqad 
We now suppose that u and » satisfy the equation 
66 Suv +p = 0 
Then from (6.5) and (6.6) we get 
67 we +o = —¢ upd = —p3/27 


Therefore ({ — u®)(é ~ 08) = & — (u + 03)é + uls? = F + gt — p*/27. Hence u® 
and »° are roots of the quadratic equation 


B+ qt — p27 =0 


Hence, using (6.2), 


6.8 


y= se 2 pa 
, 2 \i+8 


Since y = u + 0, we get for the roots of (6.4) the expression 


ow [feNfeal La feat 


A full discussion of this result would lead us too far afield; but a few words of ex- 
planation are necessary. The formula (6.9) yields in general nine different num- 
bers, since every nonzero number has three distinet cube roots. The point is that 
the two cube roots (w and 7) indicated in (6.9) must be chosen so as to satisfy (6.6). 


160 Polynomials Ch. 6, Sec. 6 


exameve 1 Find the roots of * + 2lz + 342. This is already in the form (6.4). 
Putting z = u + 0, Eq. (6.6) and (6.8) become 


6.10 w= -7 weal v= -7 


Taking u = 1, we must take» = —7, and so one root ise =u +2 = —6. Letw 
be the number 
ar ar 1 iv 

w cosy + sin = -3 +5" 
Then 1, w, w® are the three cube roots of 1. If we take x = w, then to satisfy (6.10) 
we must takev = —7w*, and sow — Tw’ is another root of our equation. Finally if 
we take u = w®, then we must take e = ~7w, getting for the third root the number 
w? — Tw, The three roots are, therefore, —6, 3 + 4- Vi. 


BIQuADRATIC Equations Thus are called equations of degree 4: 
eat wt part + be tex td =0 


There are various methods for solving this equation. They are really equivalent 
and involve the solution of cubic and quadratic equations. The method we follow 
consists of adding (ex + f)’ to both sides and choosing e and f in order to get a 
perfect square on the left. Adding (ex + f)? we get 


az ati partt bte) P+ (eter +@ +f) = (x thy 


Suppose the left-hand side is the square of, say, x” + px +9. Then we must have, 
equating coefficients, 


2p =a pt%g=b+e 2pq =e + 2ef d+f 


Rewrite the last three of these, squaring the third one: 
=p + 2g -b 4e’f? = (2pq — ¢)? P=e-d 
Substituting the first and third in the second we get 
(pq — ce) = 4(p? + 2g — b)- y? ~ d) 
or (using p = a/2) 
6.13 (ag ~ ¢)? = (a? + 8g ~ 4b) - (@? ~ dy 


This is a cubic equation for g, and it can be solved for q in terms of a, b, e, d by the 
method developed above. Once we have q, then ¢ and f can be obtained from the 
preceding equations. Equation (6.12) becomes 


(+ px +) = (ex + fP 
or 


@+otee+ ath: 2+e-eet@-H)=0 


Polynomials of degree less than 5 161 


Then finally we are left with the two quadratic equations 


r+@pteetith =a 
vt (p—elx + —fy =0 


to get the four roots of (6.11). 


ExamPLe 2 Solve x‘ — 2x + 82 ~8= 0. For this Eq. (6.13) becomes q? + 
@ +89 ~5 = 0, of which one root is ¢ = 1 (we need only one). Then from the 
equations above we find e = 4, f? = 4, ef = —4, whence e = 2, f = —2 (or vice 
versa). The last two quadratie equations are then 7? + 2x —1 = 0, x? — 2x + 
8 = 0. Hence the roots are -1 + V2 and 1 + V2i. 

‘We observe that the foregoing solutions of equations of degrees 2, 3, and 4 involve 
extracting square and cube roots of certain numbers, along with ordinary algebraic 
manipulations. The question naturally arises as to the possibility of solving 
equations of higher degree by similar operations. For certain equations of degree 
> 4, such solutions are possible. For example, 2 ~ a = 0 can he solved for any 
positive integer » by simply extracting the nth root of a by the method given in 
Sec. 3, Chap. 5. The equation 2° ~ 227 + 4 = 0 can be solved by observing that it 
is a quadratic equation in z*, and so x = (1 -& V—3)"%, 

But in general, for equations of degree > 5, it is impossible to obtain the roots by 
a series of operations of the kind encountered above—namely, operations in C 
(that is, +, —, X, +) and extraction of nth roots. Thus there is no formula (how- 
ever complicated) for the roots of equations of degree > 5 analogous to the quad- 
ratie formula.t 

The thoughtful reader will, however, find this fact itself less remarkable than 
the possibility of proving it. That was first done, for equations of degree 5, by 
N. H. Abel (1802-1829). The general question of solvability of equations depends 
upon some deep relations among equations, fields, and groups, first fully explained 
by E. Galois (1811-1832) in a letter written on the thirtieth of May, 1832. The 
following morning he died in a duel over “quelque coqueite de bas élage.” The 
algebraic theory that he sketched in his last few hours is called Galois theory. 

With the concepts developed in Galois theory, it is an easy matter to dispose of 
some other famous problems, dating from antiquity. E.g., one can prove the im- 
possibility of such ruler and compass constructions as trisecting the angle and 
duplicating the cube. 


EXERCISES 
1. Find a necessary and sufficient condition for the equation 
a tart. ++ +a,07 =0 
in order that the reciprocal of every root of it also be a root. 


{ This does not mean of course that it is impossible to compute roots of equations. There 
are several methods that enable one to calculate roots with any desired accuracy. 


162 Polynomials Ch. 6, See. 6 


2. Find the roots of the following equations: 
(a) xt — 18¢ — 35 = 0 (b) 2x3 + 80? + 8x +1 = 0 
(© x — 102? — 202 ~16 =0 (d) x — 8x2 —6r —2 =0 
3. Prove that a polynomial ax’ + bz + in R(x] with a = 0 is irreducible in 
Riz] if and only if 0? — dae <0. 
4. Consider the polynomial f(z) = ax? + bx +¢ in R[z], where ¢ >0 and 
¢ > 0. Suppose that f(z) has two distinct real roots r, and rz, Prove that if ¢is a 
number between r; and rs, then f() < 0. 
5. Let a, }, ¢ be real numbers, with a 2 0,¢ 2 0. Prove that af +2 +¢>0 
for all real numbers ¢ if and only if 6° < ae. 
*6. Referring to (6.4), let w = —}4 + i(8/2). Prove that the formula (6.9) 
also gives the roots of the equations y* + wpy + @ = 0 and y' + w’py +g = 0. 


Rational functions 


1, Introduction 


As we have seen in Chap. 6, the only invertible elements in a polynomial ring K(z] 
over a field K are the nonzero elements of the coefficient field K. Thus the symbol 
1/z has no meaning for K[z]; there is no such element in the system. The situation 
is similar to that which we encountered with Z: the only invertible elements in Z 
are 1 and —1; the symbol }¢ has no meaning for Z. Just as with the integers, it is 
often desirable to enlarge the system K[2] in such a way that division is always pos- 
sible (excluding division by zero, naturally). Now K[z] is an integral domain, by 
definition, and Theorem 3.1, Chap. 3, shows that there exists a field E, essentially 
unique, containing K[z] as a subring and such that every element of E can be ex- 
pressed as a quotient of two elements of K(x]. That is our starting point. 


2, Rational functions 


DEFINITION 2.1 Lei K be a field, and let itu, x2... , x, de independent variables 

over K. Then the quotient field of the polynomial ring K [wr x2... , Za] is called 

the field of rational functions in x1, a...» %» over K; it is denoted by K(x, x2, 
+1 Bn) 


In this chapter we shall be concerned primarily with just one variable, say x. 
According to the definition, K(x) stands for the quotient field of K[z], and by Theo- 
rem 3.1, Chap. 3, every element of K(x) can be expressed (in many ways) as the 
quotient f(x)/g(«) of two polynomials in A(z], with g(z) # 0. Thus every element 
of K(x) can be written in the form 


a a +ar t+: tana" 
™ bo + bie + + bmse™ 
where by, di, . . . , Bm are not all zero. We shall usually denote elements of K(x) by 


the same kind of symbols f(z), g(x), ete., we have used for polynomials. 
Now let f(z), g(), f’(z), and g/(x) be four polynomials (that is, elements of K[z]), 
with g’(z) and g(x) not zero. By Proposition 2.2, Chap. 3, we have then 
£2) _f'@) 


9) ~ Fe) ifand only if f(x) - g’(@) = f(x) + g(x) 


164 Rational functions Ch. 7, See. 2 


In particular, if f(x) = 0, then also s(x) = 0, and conversely. If f(x) is not zero, 
then its degree is defined, and from the equation above, fg" =f’, we see that 
deg f — deg g = deg f’ — deg g’. Therefore if for any nonzero element f(x)/g(x) 
in K(x) we define 
Sz) 
(z) 
then the integer so obtained depends only on the element f(z) g(x), not upon its 


particular representation as a quotient of two polynomials f(x) and g(z). We do 
not assign any degree to the zero element of K(z). 


2.2 deg = deg f(x) — deg g(x) 


exampce1 The rational functions x/(¢ + 1) and (2x* — x + 2)/(2® — 3) have 
degree zero; 1/x has degree —1 and so does (82 + L/(@? +241). 
It is easily verified that for any nonzero elements hi(z), he(x) in K(x) we have 


23 deg (u(x) - ho(x)) = deg hile) + deg he(x) 


However it is not true in general that deg (i; + he) < deg fi + deg fy. That 
holds for polynomials, but for example 


ia(i+3)- oe) ~~ 


whieh is nol < deg (1/x) + deg (1/x°) = —3. It is easily shown, however, that 


2h deg (#1 + he) < max [deg hi, deg hea] 
where max indicates the greater of the two numbers (unless they are equal, in 
which case max indicates their common value). 

The following theorem shows that every element of K(x) can be expressed in 
“lowest terms.” 


THEOREM 21 Let h(x) be any element of K(x). Then k(x) can be expressed in one 
and only one way as a quotient 


Ae) 
k(x) = —— 
®) = g@) 
of relatively prime polynomials, with g(x) monic. 


Proof. By definition of K(x) the element h(x) ean be expressed in some 
way as a quotient fi(x),’g.(x) of polynomials, with g: # 0. Let u(x) be the 
g.c.d. of f(z) and gi{x) (see (3.1), Chap. 6]. Then we can write fi(x) = 
Sole) + u(x) and gi(z) = go(x)-u(z), where f, and g: are polynomials. 
Then w+ (figs — fest) = fig: — figs = 0, and therefore fig: = fags, since 
u(t) #0. Henee, k(x) = fo(x)/go(x). Now let ¢ be the highest coefficient 
of g(x) and put g(x) = (L’c)g(x), f(z) = (’e)fi(x). Then h(x) — 
S(z)/g(2), and g(x) is monic and (f,g) = 1. The uniqueness is left as an 
exercise. (Theorem 3.6, Chap. 6, can also be used to prove this theorem.) 

Q.E.D. 


Partial fractions 165 


In Chap. 6 we saw that substitution of any element ¢ of K for x in any polynomial 
F(x) of Kz] produces a homomorphism of K[z] to K. This is no longer the case for 
the rational-function field K(x). For example, 2 cannot be substituted for x in the 
rational function 1/(¢ ~ 2), obviously. See Exercise 2 below. 


EXERCISES 

1. Prove the uniqueness of f(x) and g(x) in Theorem 2.1. 

2, Let ¢ be an element of the field K, and let A denote the set of all elements 
k(x) of K(z) which can be expressed as quotients k(x) = f(x).g(x) of polynomials 
such that g(c) # 0. Show that if A(c) is defined by A(c) = f(c)/g(c), then the re- 
sult does not depend upon the particular representation of h(z) as a quotient of 
polynomials f/g, provided of course that g(c) ~ 0. Show that A is a subring of 
K(z) and that the mapping A — K defined by A(z) — h(c) is a homomorphism. 

3. Prove (2.8). 

4, Let p(x) and g(x) be irreducible polynomials in K[z]. Prove that there exist 
polynomials f(x), g(x) such that in K(x) we have 

1_f ig 
mopte and deg f < deg p, deg g < deg ¢ 


8. Partial fractions 


The remainder of this chapter is concerned with a particularly simple and useful 
method of expressing rational functions in K(x), where K is again a field. 

Any element of K(x) can be expressed as a quotient f(x)/g(x) of polynomials, 
with g(x) 0. We can assume that g(x) is monic, for otherwise we have only to 
divide f(x) and g(x) by the highest coefficient of g(x). Then by Theorem 8.6, 
Chap. 6, the polynomial g(x), if its degree is positive, has a prime decomposition 


aa o(e) = pile) pe(r)® » + > Dale)" 

where pi, px, ..., P. are distinct monic irreducible polynomials in K[x], and. 
where @1, €2, . . . , &, are positive integers. Now let ¢;(x) denote the polynomial 
a2 gil) = pit pe? ~~ DEE pH + 

obtained by omitting the factor pj‘ from (3.1). The polynomials qi, ga... + Gu 
so obtained are relatively prime, for clearly none of the polynomials pi, Pa... Pa 
divides all the q’s, Therefore, by Exercise 4, Sec. 3, Chap. 6, there exist poly- 
nomials s:(2), s2(z), . . - , 8,(2) such that 

T= sign + 82g. + + + 8ndn 


Multiplying by /()/g(z) and using (3.1) and (3.2) we get 


f fem fom Fo 8a 
. 2o2 Sa... pte 
” g prt pet Pat 


166 Rational functions Ch. 7, See. 3 
Applying the division algorithm (Proposition 3.2, Chap. 6) to f+ s; and p,/ we get 
34 fis; = hye pe +r; where 7; = 0 or deg r; < deg pj 


h, being of course the quotient and r; the remainder. Putting this in (8.3) for each 
gj =1,2,..., 2, we obtain finally 


{@ 
a(x) 


where h(x) is the polynomial 


- n@ oo. rz) 
= he) te to Faas 


35 


A(z) = Aa(w) + hale) + ++ + + Belz) 


Now (3.5) can be simplified a little further if any of the exponents e,, 
exceed 1, Consider a typical term r(x)/p(z)* on the right of (8.5). We omit the 
index j temporarily, If r = 0, then of course there is nothing further to do. If 
r #0, then by (8.4) we have deg r < deg p’. If e > 1 we apply the division 
algorithm to r(x) and p(x), getting say 


r=uepotty v, = 0 or deg m < deg pt! 


Ife — 1 > 1, then apply the division algorithm to #, and p*—*, and so on. In this 
way we get a series of equations 


) = Ug- pe? + vy vy = 0 or deg vw, < deg po? 
Veng = ene P+ Vea Pe-2 = 0 or deg »,» < deg p* 
Ven = Mei P+ Oa te = 0 or deg a1 < deg p 
Yen = Ue Ll 


From the first equation above we see that deg u, < deg p because deg r < deg p*. 
In a similar way it follows that ws, ws, . . - , ues all have degree < deg p or else 
are zero. Combining these equations we get 


P= my pep ue pee too tthe D+ te 
and so 

rm | Ue 
36 224434... 4% 

Poop pt +7 


where uj = 0 or else deg uj < deg p. We have therefore proved the following 
theorem. 


THEorem 3.1 Let f(x)/g(x) be any element in K(x), where f(z) and g(x) are poly 
nomials with g{c) 0. Then f(x)/g(x) can be expressed as a sum of terms of the 
following type: (1) a polynomial and (2) if deg g(x) > 0 then for each factor p(x) in 
the prime factorization of g(x) a sum of terms wp + > ++ + te/pt, in which the u; 
are polynomials such that if u; + 0, then deg u, < deg p. 


Partial fractions 167 


The decomposition of f(x)/g(z) guaranteed by the theorem is called the partial- 
fraction decomposition of f(x)/g(x). It is not hard to see that it is unique in the 
sense that in any two such decompositions the nonzero terms are identical. The 
result is particularly simple for C(x) because the irreducible polynomials in C[z] 
all have degree 1, and so the numerators «, appearing in the theorem are all con- 
stants. In R[x] irreducible polynomials have degree 1 or 2 and therefore in that 
case the numerators all have degree at most 1. Several examples are worked out 
below. The proof of the theorem carries with it a method for making explicit 
computations, because each step of the proof simply involves use of the division 
algorithm. However in practice there are many short cuts which materially re- 
duce the labor entailed. 


examece1 Find the partial-fraction decomposition of 


_ Bett Be $2 

IO) = Sy oe 8 
The denominator factors into (2 — 2)(r + 1), and these factors are irreducible in 
Q[z]. Hence 


in Q(z) 


u a 
fe) =k) + aa +747 

where h, u, » are polynomials. The degree of u(x) must be less than 2, and the 
degree of (x) must be less than 1. Hence « has the form u(x) = ax + 6 and v(x) 
has the form » = c, where a, 6, c are constants. Now multiply the equation above 


by the denominator, getting 
B82 4 Br +2 = h(x)(e? —1)(e +1) + (ee +b) +1) + ele? — 2) 


It is clear from this that A(x) = 0, for otherwise the right-hand side would have 
degree > 8. Multiplying out the factors we get then 


B22 4 2x +2 = (a te)e2 + a +djx +b —2e 


Therefore (Proposition 2.2, Chap. 6) we have a +¢ = 
Solving these three simultaneous equations we get @ = 
the desired decomposition is 


i -& 


Now in R{z] the polynomial 2° — 2 is not irreducible. In fact 2? — 2 = 
(x — V2)(2 + V2). Therefore, applying Theorem 3.1 to the first term above we 
get 


168 Rational functions Ch. 7, See. 3 
where ii(z) is a polynomial and where a’, 6’ are constants. Multiplying both sides 
by 2? — 2, we see at once that k(x) = 0, and 

6x —4 =a’(e + V2) $e — V2) 
whence a’ + b’ = 6, (a’ — b’)- V2 = —4. Solving these we get a’ =3 ~ V2, 
b’ = 3 4-2. Hence the partial-fraction decomposition of f(z) in R(x) is 


B= v2 8+v2 8 
g-V2"24 V2 7241 


f@) = 


Remark. It is easily seen that if deg f < deg g in Theorem 3.1, then the poly- 
nomial part (1) of the partial-fraction decomposition is zero. If deg f > deg g, then 
using the division algorithm we can write f(x) = q(x) + g(x) + fix), where deg fi < 
deg g. Hence f’g = ¢ + fig, and so (x) is the polynomial part of the partial- 
fraction decomposition of f/g. 


EXAMPLE 2 Find the partial-fraction decomposition in Q(z] of 


fa) = SEA We = 2 
*) = 329 — Ta? — lax + 40 

Here the numerator has smaller degree than the denominator, and so the poly- 
nomial part is zero. The denominator factors into (w — 2)(« — 4)(2x + 5), andso 
by Theorem 3.1 we have 


¢ 
+ pty tape 


ar F(x) : ares 


where a, 6, ¢ must be constants. Multiplying both sides of this equation by the 
denominator we get 


92? 4 10r — 2 = ale — 4)(2x +5) + Ble — 2)(2e +5) 
+e(z — 2)(n — 4) 


Multiplying this out and equating coefficients we get 


2a + 2b +e = 
—8a +b — 6c = 10 
—20a ~ 106 4+ Be = —2 


Solving these equations we get a = —3, = 7,c = 1, and so 


f(z) = 


We now point out a useful short cut which applies in many cases; namely, multiply 
(8.7) by z — 2. There results 
9x? + 102 - 2 z-2 z-2 


@o—Her45 tte yat aps 


Partial fractions 169 


Put x = 2 in this, getting 


ora = —3. Both b and ¢ can be found in the same way. 


example 3 For the rational function f(z) = (82 + 15) ((z + 1% — 2) the 
theorem tells us that the partial-fraction decomposition has the form 


3x? +15 a b ¢ a 
ed ip@ -B Tesi est ert 
where a, b, ¢, and d are constants. Multiplying through by 2 ~ 2 and putting 
2 for xin the result, we get d = 1. Similarly, multiplying by (@ + 1)° and putting 
2 = -l,wefinde = —6. Buta and & cannot be found so simply. ‘To determine 
them we proceed as in Examples 1 and 2: namely, multiply (3.8) by the total 
denominator, getting 


2 


8x? + 15 = a(z + 1° — 2) + B(e + De — 2) — Bir — 2) +r +1 


Equating coefficients here we obtain 


a+1=0 
b=0 
Ba +b = -3 
a+b=1 
Naturally only the first two equations are necessary, and we havea = —1,b = 0, 


We mention here another device that is sometimes useful: namely, put r +1 = 
yin (8.8), It becomes 


38y 
“ye 


Clearing out the denominator gives 

By? ~ 6y + 18 = (ay? + by + e)(y ~ 3) + dy? 
and it is very easy to equate coefficients here. 

EXERCISES 


1. Find the partial-fraction decomposition of each of the following, in both Q(z) 
and C(z). 


@) Beta 
(2 — 3)(@ + 1)( + 2) 
@ td 
P= 9x 


QxF + Bart + 52" 
G+ De + 


Bat = 2r (Sat = Bde? + 48 
(x ~ 2y'@ + 1) 


{e) 


170 Rational functions Ch. 7, See. 3 


@ __ 262? + 208% 
Ge? + 1)P@ + 5) 

2. A theorem analogous to Theorem 3.1 above holds for integers. Work it out. 
Express 25% 49 in the form n + a/5 + 5/2 +¢/2! + 4/8 + e/3? + /3%, where 
0<a<5,0 £b0<2,064ef <3. 

*3. Prove that for any positive integer n 


(x = I) - 2) 


“poe CoG es 


Bai 
4, Find the partial-fractions decomposition of 
i a 
+ Pot 


in C(x) and in R(a). 


Vector spaces and affine spaces 


1. Introduction 


The subject of vector spaces, which sometimes goes under the alias “matrix alge- 
bra,” is one of the most important parts of mathematics in applications to the 
natural sciences, and it is an indispensable tool in many parts of mathematics 
itself. In physical applications, the elements of a vector space are used to represent 
complex physical quantities. For example, forces and velocities are represented in 
mechanics by elements of a vector space: quantum mechanics, the states of a 
system are represented by elements of a vector space; in modern electrical engineer- 
ing, signals are represented by such elements; and so forth. 

We do not intend here to study any of the various applications of vector spaces, 
and therefore we do not want to attach any unnecessary interpretations to them. 
We shall study vector spaces as abstract algebraic systems in order to discover the 
main consequences of the algebraic laws which are assumed to hold. Elements of 
vector spaces will often be called “vectors,” but again we attach no physical mean- 
ing to that term. 


2. The basic definitions 


The main ingredients required are (1) a field K and (2) an abelian group V. The 
elements of the field K will be denoted by lowercase italic letters a, , ¢, ete., and 
will frequently be called scalars. The elements of the abelian group V will be de- 
noted by boldface letters u, v, x, etc., and sometimes those elements will be called 
vectors. 
First let us recall from Chap. 1 that an abelian group V is a set of elements with 
a binary operation (denoted here by +) such that the following axioms apply: 
(1) The operation + assigns an element u + v of V to every pair of elements 
uyvinV. 
@Q)u+(v+w) = uty) + w for any elements u, ¥, w of V. 
(3) u+v =v + u for any elements u, v in V. 
(4) V contains an “identity element’ 0 (necessarily unique) such thatu +0 =u 
for every element u in V. 
(5) For each element u in V there is a unique element in V, denoted by —u, such 
that u + (—u) = 0, 


172 Vector spaces and affine spaces Ch, 8, Sec. 3 


For the definition of a field we refer to Chap. 3. 

Now V is said to be a vector space (or linear space) over the field K if, in addition 
to the various operations in K and V, there is given another operation, called 
scalar multiplication, satisfying the following new axioms. 

(6) For any element a in K and any element u in V, scalar multiplication assigns 
to the pair a, w another element of V, which we denote by a-u or au. It is 
called the product of a and u. 

(7) (@+b)-u = a+ (b+ w) for any elements a, b in K and nin V. 

(8) @+ (uty) = @-u) + (@-¥) for any a in K and any elements u, v in V. 

(9) (a +4) (a-u) + (b-u) for any a, b in K and any element u in V. 

(10) 1-u =u for every element u in V, where 1 is the identity element for mulli- 
plication in K. 

These axioms say in effect that there is some way of “multiplying” elements of 
V by elements of K, giving again elements of V and satisfying reasonable rules of 
calculation, Axiom (7) above is a kind of associative law, and axioms (8) and (9) 
are distributive laws. When both + and scalar multiplication are involved in an 
expression we shall follow the usual conventions concerning omission of parentheses. 
For example, we write axioms (8) and (9) simply as a(u + v) = au + av and 
(a + 5)u = au + bu, respectively. 

In most physical applications of vector spaces the field of scalars K is either the 
field of real numbers or the field of complex numbers. We shall give special atten- 
tion to these cases later. 


3. Some consequences of the axioms 


Both K and V contain zero elements- i.e., the identity elements for their respective 
operations of addition. We denote the zero element of V by 0, that of K by 0 as 
usual. 


PROPOSITION 3.1 For any element u in V we have 
Ou =0; 

for any element ¢ in K we have 
c0=0 


Proof. Since 1 +0=1 in K we have 1-u = (1 +0)-4, whence 
1-u=1-u+40-u, or u =u 4 0-u, by axioms (9) and (10). Adding 
—u to both sides we get 0 = 0 + 0-u [axiom (5)], and so 0 = 0-u, by 
axiom (4). 

To prove the second part we start with v +0 =v for any v in V 
[axiom (4)]. Then c+ (v +0) = ¢+v, and so from axiom (8) we have 


Some consequences of the axioms 173 


e-v+e-0=c-yv. Nowe-vis an element of V (axiom (6)] and so it has 
an inverse ~(c-v) in V, by axiom (5). Adding that to both sides we get 
e-0=0 QED. 


PROPOSITION 3.2 For any cin K and any uin V, 
(~e)u = e(—u) = ~(eu) 
The proof is left as an exercise (cf. Theorem 2.3, Chap. 2). 


PROPOSITION 3.3 Lei ¢ be any element of K, and let us, u», . . . , u, be any finite set 
of elements in V. Then 


ea +e bu) sem +--+ + cu, 
Similarly if ay, a2, . . . , ds are any elements of K and v any element of V, then 
(+ +++ +@)-v=av+ +++ tay 


The proof is left as an exercise. The first part follows from axiom (8) by a simple 
induction, and the second assertion follows in the same way from axiom (9). Argu- 
ments of a very similar sort were given in Sec. 7, Chap. 2. 

The theorems above, as well as the axioms themselves, are used constantly in 
what follows, and we shall usually not refer to them explicitly. 


EXERCISES 


1. Give complete proofs of Propositions 3.2 and 3.3. 

2. Let V be a vector space over a field K, and let v; and ¥» be two fixed vectors 
in V. Let V’ be the subset of V consisting of all elements av, + v2, where a, } 
are arbitrary scalars. Show that the sum of two elements of V’ is again in V’; 
show that the product of an element of K and an element of V’ is again in V’. 
Finally show that V’ satisfies the axioms for a vector space over K. 

3. Let U and V be any two vector spaces over a field K. Let W be the set of 
all pairs (u, v) consisting of an element u in U and an element v in V. Define an 
operation + in W by the rule (a,v) + (a, ¥') = (a +u’,v +’). Further define 
scalar multiplication in W by the rule c- (u,¥) = (cu, cv) for any ¢ in K and any 
element (u,v) in W. Prove that W with these operations is also a vector space 
over K. (W is called the direct sum of U and V and is sometimes denoted by 
UevV) 

4. Let R and € be the fields of real and complex numbers, respectively (see 
Chaps. 4 and 5), Show that € (with its usual operation of addition but with 
multiplication simply ignored) is a vector space over R if scalar multiplication is de- 
fined by the rule ¢- (a + bi) = ca + cb i, where a + bi is any complex number, 
a and 6 being real, and where c is any real number. 


174 Vector spaces and affine spaces Ch. 8, Sec. 
4. Some important examples 


Here we define some special vector spaces which will be used frequently. 
Let K be a field, and let » be a positive whole number. Denote by K,, the set 
of all n-tuples 
a= (1, dy... , On) 
consisting of n elements of K in a specific order. We make K, into a vector space 


over K as follows: 


aa Ifa = Gas, ... an) andb = Qh, b2, . . . , Dn) are any two elements of 
Ky, then we define 


atb = (a + bi a2 + by... Gn + bs) 

42 Ifa = (a, 02... , an) is any element of K, and c any element of K, then 
we define 
€+a = (a1, Cay... , €4n) 


It is easily verified (by using the axioms for K!) that addition as defined in (4.1) 
satisfies axioms (1) to (5), Sec. 2, and that scalar multiplication as defined by (4.2) 
satisfies axioms (6) to (10), Sec. 2, Therefore K, with those operations is a vector 
space over K. The zero element of K,, is the n-tuple (0, 0, . . . , 0). 


REMARK 1. If we take 2 = 1 in the foregoing, then K, is the same as K itself, 
save that in Ky we ignore the multiplication of two elements. We shall not bother 
to distinguish between K and Ky. 

Figure 1 Elements of a vector space are sometimes depicted by arrows. If one 
chooses a cartesian coordinate system in the plane, the element (a, a) of the vector 
space R; is depicted by an arrow from the origin (0, 0) to the point with coordinates 
(a1, a) for any a, a: in the field of real numbers R. Vector addition is obtained by 
“completing the parallelogram" as shown in the figure. 


{@: + By 82 + B2} 


Some important examples 178 


(by — ay b2 — a2) 


Figure 2 


Subtraction in the vector space R.. 


(0, 0) 


Remark 2. Referring to Exercise 3, See. 3, we see that K, is nothing but the 
direct sum K ® K. More generally K, can be considered as the direct sum 
KBKD--- BK ol K with itself n times. 


In the definition above, instead of using ordered n-tuples (a1, a2, . . . , dy) of 
elements in K, we could equally well use infinite sequences of elements (a), az, 
as, . . .}. Denoting the set of all such sequences by K.., it is easily verified that 
K,, can be made into a vector space over K by definitions analogous to (4.1) 
and (4.2). 

In the definition of the notion of vector space in Sec. 2 only one binary operation 
+ was assumed in V; we did not assume that any kind of “multiplication” of two 
elements of V was defined. But as the example of K, shows, it may happen that 
a vector space V comes equipped with more than one binary operation. We shall 
encounter several important examples in which this is the case. If in V there is 
defined a second binary operation u-v such that V is a ring (see Sec. 2, Chap. 2), 
as well as a vector space over K, then V is called a K-algebra (or hypercomplex 
system orer K) if the following new associative axiom is satisfied: 


43 e(u-v) = (eu)-¥ =u- (ev) — for any cin K and any u,v in V 


For example, C is an R-algebra (see Exercise 4, Sec. 3). Similarly the set of all 
2 X 2 matrices 


ab 
¢ a) 
with coefficients in K is an example of a K-algebra (see Exercise 3, Sec. 2, Chap. 2). 
Note that 2 X 2 matrices are simply quadruples of elements of K in a specific 
order, and therefore the set of all 2 X 2 matrices with coefficients in K can be 
identified with K, as defined above. 

We mention some other important examples of vector spaces: K being a field, 
let. K[¢] be the set of all polynomials in an indeterminate { with coefficients in K. 
Then with the usual definition of the sum of two polynomials and multiplication 
of polynomials by elements of K, it is easily verified that K[é] is a vector space 
over K. K[¢| is in fact a K-algebra because an operation of multiplication of two 
elements of K(é] is also defined, and it satisfies (4.3). 

For any positive integer n, let Kt], denote the set of polynomials in K[f) of 


176 Veetor spaces and affine spaces Ch. 8, Sec. 5 


degree less than x, The sum of two such polynomials is again a polynomial of de 
gree <x, and so is the product of a polynomial in K(t), by an element of K. It is 
easily verified that K[d), is a vector space over K. (But it is not a K-algebra, ex- 
cept for » = 1, because in general the product of two polynomials in Kf], will not 
be in K[fn.) 


EXERCISES 
1. Perform the indicated operations in Qs, Q being the field of rational numbers: 
3-1, 0,0) +4-@,1,0) —16-,0,b, 1,2) —5-@1,0) 
and 
3: (0,1, 4) — @,1,2) +19- 4, 12, -2) 44,1, -D 
2. Prove that the only rational numbers a, & such that 
a-(2,1,8) +6-(, -1,4) =90 
area =b = 0. 
3, Show that there are rational numbers a, b, ¢ not al] zero such that 
a-(1,2,2) +b-(@, 0,4) +¢- (6, —2,6) =0 
Describe the set of all such triples (a, b, ¢). 
4. Consider the vectors a = (1, V2) and u: = (V3, 2) in Ra Prove that the 
only real numbers a and such that 
au, + bu, = 0 
area =6 = 0. Let v be an arbitrary element of R». Prove that there exist 
uniquely determined real numbers a and 6 such that 
¥ = am, + buy 
5. Let e: = (1, 0, 0), ex = (0, 1, 0), and ey = (0, 0, 1) be in Ry. Let v be an 
arbitrary vector in Rs. Prove that there exist uniquely determined real numbers 
x, #, z such that 
¥ = re; + yer + 2€3 


5. Subspaces 


V being a vector space over a field K, it may happen that a part of V is also a 
vector space over K. The part is then called a subspace of V. For example, 
Kt], (defined at the end of See. 4) is a subspace of Kit]. The formal definition is 
given below. 


DEFINITION 8.1 Let U be a subset of V. Then U is called a subspace of V if the follow- 
ing conditions are satisfied: 

(1) U is not empty. 

(2) wand v are any two elements of U, then u +v is also in U. 

(3) For any u in U and any ¢ in K, the element cu is also in U. 

The last two conditions say simply that the vector-space operations in V, when 

applied to elements of the subset U, produce elements in U. It is easily verified 
from the axioms for V that if U is a subspace, as just defined, then U also satisfies 


Subspaces 177 


the vector-space axioms when equipped with the operations defined in V, and U 
is therefore a vector space over K. 

It is clear that V itself satisfies the conditions of Definition 5.1, and consequently 
V is to be counted as a subspace of V. Observe that the zero element of V forms by 
uself a subspace of V, as follows at once from Proposition 3.1. 


examptes Let K be a field, and let K; be the vector space defined in Sec. 4. Then 
all elements of Ks of the form (a1, @2, 0) constitute a subspace U of Ks. Similarly 
al] vectors in Ky of the type (a, 0, 0) form a subspace W of Kz. Moreover, W is 
also a subspace of U. The verifications are trivial. 

As mentioned above, K(f]: is a subspace of K[t]. Furthermore, if m <n, then 
K (tlm is a subspace of K[f),. 

The set V’ defined in Exercise 2, Sec. 8, is a subspace of V. The following 
proposition is a generalization of that exercise. 


proposition 51 Le! V bea vector space over a field K, and let wu, ... , a bea 
set of r vectors in V (r > 0). Let U be the set consisting of all elements xu + 
tally + +++ + aU, where m4, ... , x are arbitrary elements of K. Then U isa 
subspace of V (called the space generated or spanned by u;, . . . , u,)- 

The proof is left as an exercise. 


perinition 5.2 Let V be a vector space, and let U,,..., U, be subspaces. Then 
V is said to be a direct sum of those subspaces tf every element ¥ in V can be expressed 
in one and only one way as a sum 


vou tm tees tay 
where u, is in U1, usin Ua, . . . , Un in U,. 
In this situation we write V = U,GU,@ - ++ GU,. 
exampte XK, is the direct sum of the subspaces Ui, . . . , U,, where U; consists 
of all elements in K, of the type (0, . . . , aj, . . . , 0), in which the only nonzero 


element is the jth. As another example, K, is the direct sum of V: and V», where 
V, consists of all vectors of the form (ai, a2, 0, 0) and V: consists of all vectors of 
the form (0, 0, a3, as). Ete. 

Referring to Exercise 3, Sec. 3, it is easily verified that W is the direct sum of 
the subspaces U’ and V’ defined as follows: U’ consists of all vectors of the type 
(u, 0), and V’ consists of all vectors of the type (0, u). It is clear moreover that 
U’ is essentially the same as the original space U. More precisely, the mapping 
u-—(u, 6) of U to U’ is a one-to-one correspondence which is an isomorphism 
(i.e, is compatible with the vector-space operations). Similarly V and V’ are 
isomorphic. (The notion of isomorphism of vector spaces is analyzed more fully 
in Chap. 9.) Hence the direct sum as defined in Exercise 3, Sec. 8, is substantially 
the same as that of Definition 5.2. 


178 Vector spaces and affine spaces Ch. 8, See. 6 


EXERCISES 


1. Prove Proposition 5.1. 

2, Let U; and U; be two subspaces of a vector space V. Show that the set 
consisting of all elements common to both U, and Us is a subspace of V and is 
also 4 subspace of U, and of U;. 

3. Let V be a vector space, and let U7, and U, be two subsets of V, with Uy con- 
tained in U;. Prove that if U, is a subspace of V and if U» is a subspace of U1, 
then U; is a subspace of V. Prove that if U; and U; are subspaces of V, then Uy 
is a subspace of U;. 

4. Let V be a vector space, and let {U;} be any family of subspaces, possibly 
infinite in number. Let W be the set of elements common to all the subspaces 
U; in the family. Prove that W is a subspace of V. 

5. Let V be a vector space over a field K, and Jet M be any subset of V con- 
taining at least one element. Let U be the subset of V consisting of alt elements 
that can be expressed as finite sums of terms of the type ¢-u, where ¢ is in K and 
uisin M. Prove that U is a subspace of V (it is called the subspace generated, or 
spanned, by M). If M consists of only a finite number of vectors m1, Wz, . - . , Uy 
in V, show that U as defined here is the same as the subspace U defined in Proposi- 
tion 5.1. 

6. K, V, M, and U being as in the preceding exercise, let U' be any subspace of 
V containing M. Prove that U must be a subspace of U’. (Thus U is the smallest 
subspace of V containing M.) 

7, Let V be a vector space over a field K, and let mw, uo, ... , u, be fixed 
vectors in V. Prove that all r-tuples (a, x2, ... , #,) of elements in K such 
that am, + rao +--+ + 2,u, = 0 form a subspace of K,. Describe that sub- 
space in the case where wu, t, . . . are the following three vectors of Ky: 

uw = (1, 0, 0, 0) us = (1, 1, 0, 0) u; = (0, —1, 0, 0) 

8. Let V be a veetor space, and tet U, and U, be two subspaces. Prove that 
V is the direct sum of U, and U, if and only if the following hold: U, and Uy 
have no element in common except 0; the elements of U; and Us generate V, in 
the sense of Exercise 5. 

9. Let V be the direct sum of subspaces Ui, Us, U3. Let W be the subspace of 
V generated by the elements of U, and U3. Prove that V is the direct sum of U; 
and W. 


6. Linear independence and dimension 


DEFINITION 6.1 Lei V be a vector space over a field K, and let uw, us, . .. , Uy be 
elements of V. Then they are said to be linearly dependent if there exist scalars ay, 
a, ..., a, in K, not all zero, suck that 


Linear independence and dimension 179 
Qu, + Gog + ++ Gu, = 0 


In the contrary case the vectors w,, . . . , a, are said fo be linearly independent. 


exampte The burden of Exercise 2, Sec. 4, is to show that (2, 1, 3) and (2, —1, 4) 
in Q; are linearly independent. In Exercise 3 one is supposed to show that (1, 2, 2), 
(8, 0, 4), and (5, —2, 6) are linearly dependent. 

Suppose now that given vectors u,, us, ..., u, in V are linearly dependent. 
Then according to the definition there are scalars a, a», . . . , @,, not all zero, 
such that awn + am +--+ +4,u, = 0. Suppose, for example, that a; » 0. 
Then we can multiply the equation by 1/a, getting 


ay a3 a, 
= -(—-u+—-u ne oe 
4 qe tut +5 ) 
We conelude at once that m, us, ..., u, are linearly dependent if and only if 


one of them ean be expressed as a linear combination of the remaining ones, with co- 
efficients in the field K. [Note: By a “linear combination of elements wi, W2, 

., Win V with coefficients in K” we mean an expression of the type cw: + 
coW: + +++ + e,Wn, where cy, €2, ..., ey are elements of K. Such an expres- 
sion stands on the right-hand side of the equation above for u,.] 

The advantage of the formulation of linear dependence in Definition 6.1 is that 
it allows us to say that at least one of the vectors v1, ¥2, . . . , ¥, can be expressed 
as a linear combination of the others (with coefficients in K), without requiring 
us to say just which one(s) can be so expressed. 


DEFINITION 6.2 Let W, Uy, Uy, 
dependent on wi, .. ©. Uy 
> +e... 
The following proposition is used repeatedly. It is an immediate consequence 
of Definition 6.1, and its proof is left as an exercise. 


++, W, be vectors in V. Then w ts said to be linearly 
it can be expressed as a Linear combination w = eu, + 


proposition 61 Leél ¥,, v2, ... , V, be linearly independent elemenis in a vector 
space V over a field K. Then an equation 


ayy + ase $e + tay, = bin + be + os + + 8,¥, 
can hold (with ay, ..., a,and by ..., b, in K) tf and only if ay = 61, a2 = 
14, =), 
DEFINITION 6.3 Infinitely many elements ¥1, v2, Vs, ... tm @ vector space V are 


said to be linearly dependent if ihe rectors in some finite subset of them are linearly 
dependent. Otherwise the vectors ¥;, V2, Vs, ete., are said to be linearly independent. 


{It often happens that a vector space V over a field K is also a vector space over another 
field K’. For example, a vector space V over ¢ is also a vector space over R and Q, as is 
easily seen. In this situation it is essential to be very explicit about which field is under 
consideration. One frequently uses such terms as “linearly dependent over K” to avoid 
any possible confusion. 


180 Vector spaces and affine spaces Ch. 8, Sec. 7 


DEFINITION 64 Lei V be a vector space over a field K. Then V is said to have di- 
mension n (where n isan integer > 0) if there exists in V a set of 2 linearly independent 
vectors and if there exists no set of more than n linearly independent vectors. In this 
case we write dim V =m. If V contains a set of infinitely many linearly independent 
vectors, as defined above, then we write dim V = «.f 
Observe that dim V = 0 means that V must consist of the zero element alone. 
The preceding definitions are very important, and they will be used constantly. 


EXERCISES 
1. Let v1, ¥ ..., Vand wi, We, ..., W, be elements in a vector space V. 
Prove that the subspace of V generated by vi, v2, . . . , vr is the same as the sub- 
space generated by ¥1, ¥2,..-, ¥r, Wi Wa», Wa if and only if each of the 
w’s is linearly dependent on w, .. . , Ve 


2. K being a field, prove that dim K, =n for » = 1 and n = 2 (the proof of 
this for all positive integers x is left for the next section). Prove thatdimK, = »-. 
7. A theorem on linear equations 


In order to prove some important theorems about the dimension of vector spaces 
we need the following auxiliary lemma: 


tema 71 Let K be a field, and let 


aX, tanX. +++ + aunX, 
4 AX, + 2X2 + - +> + OenXn 

Ama Xy + dmeXs + ++ + + AmnXy = 0 
be a set of m linear equations in the unknowns X,, Xo... , Xq with coefficients ax, 
ays, ete., in K.t If n > m, then there exist n elements cr, 2, . . . , Ca in K, not all 
zero, which satisfy the equations when substituted for X1, Xo, . . . , Xn» respectively. 
REMARK. Obviously if we put 0 for Xi, ... , X,, then Eqs. (7.1) will be satis- 


fied. We call this the trivial solution. The theorem says that there is a non- 
trivial solution ifm > m. Observe that if we put 


Aj = (Aj Gey - + + 4 Oni) 
forj =1,2,..., ”, then the a, are veetors in K,, (see Sec. 4). The theorem 
says simply that there are elements ¢, ¢, .. - , ¢, in K, not all zero, such that 
ea + Ga + + + + ¢,a, =0,if x > m. In other words a, ar... , a must 


be linearly dependent if n > m. 

{ If it is desirable to be explicit about the field of scalars involved, one speaks of the dimen- 
sion of V over K, writing dimx V. For example dit, ¢ = 2, asis easily seen, but dimg ¢ = 
(the latter is not so easy to prove). 

{ The double indices simply indicate the row and column in which the coefficient appears. 
This kind of notation is very convenient and will be used frequently. 


A theorem on linear equations 18! 


12 


73 


TA 


Proof. We use mathematical induction to prove the theorem. In 
order to avoid a somewhat cumbersome double induction on both m and 
n, we observe that it suffices to prove the theorem for the largest possible 
value of m—namely, n —1. For if m <x —1 in (7.1), then we can 
simply add on more equations (n — 1 — m of them) in which the co- 
efficients are all zero, bringing the total number of equations up to n — 1. 
Writing that m = » — 1, or x = m +1, in (7.1), that system is 


ayX, + apX2 +e + Omit Xmar = O 
anX, + daXe + +++ + AameXnt1 = 0 


AmtX1 + AmeX2 + ++ + + OmmarXmsr = 0 


We now use induction on m. The theorem is clearly true if m = 1, for 
then there is just one equation a1.X; + aX» = 0 in two unknowns. If 
Gy, and azz are both zero, then we can take any elements in K for X, and 
Xs. If an, az: are not both zero, then we have merely to put X; = aw 
and X. = —a,; to get a nontrivial solution. 

We now assume that the theorem holds for some value mo of m, and we 
show that it must necessarily hold for the next value m = mp +1. That 
will establish the theorem. Now if all the coefficients in (7.2) are zero, 
it is obvious that any values in K for Xy, . ~~, Xm+1 will satisfy the 
equations, and so the theorem is true in that case. If on the other hand 
not all the coefficients a,; are zero in (7.2), then we can assume for sim- 
plicity (by renumbering the equations and unknowns if necessary) that 
the last coefficient @n,n41 is not zero. Then we can solve the last equation 
for Xn» getting (we write } for dn.n41) 


rmim-tt) 


1 
Xe = 5° (QmiX1 Fos + + + Gm Xm) (= 


Now substitute this expression for Xn+1 in the first m —1 equations. 
Then, e.g., the ith equation becomes 


(cn 2) +s + (am — 2) X= @=1,...,m-D 


In this way we end up with m — 1 equationsin the m unknowns Xi, . 

X,,- Therefore by our induction hypothesis Eqs. (7.4) have a nontrivial 

solution X; =ci,..., Xn =¢m in K. Substituting these in (7.3) we 

get a value ¢nsi for Xn41, and it is immediate that the elements 1, ¢2, 
. , (nei of K 80 obtained satisfy the original Eqs. (7.2). @.E.D. 


EXERCISES 


1. Take for K the field of rational numbers @. Find a nontrivial solution 


€1, Cy, €2, Cy in Q for the system 


18@ Vector spaces and affine spaces Ch. 8, See. 8 


2X, —-3X,+ Xi—- X,=0 
X, + 5X, —2X:4+8X,=0 
X,+ X,-5X,=0 
2. Describe the set of all solutions (c,, ¢2, ¢, cs) in Q of the system above. What 
happens if Q is replaced by R? 
3, Prove that if K is any field and » any positive integer, then the vector space 
K, defined in Sec. 4 has dimension n. 
4. Prove that Lemma 7.1 is true if K is assumed to be an integral domain, not 
necessarily a field. 
5. Referring to (7.1), prove that all n-tuples (¢1, ¢2, ..., ¢n) of elements of 
K which are solutions of the system of equations form a subspace of K,. 
6. Prove that if the system 
aX, + bX, =0 
eX, + dX, =0 
(with coefficients in a field K) has a nontrivial solution, then the vectors (a, 6) 
and (¢, d) in Kz must be linearly dependent. 


8. On the dimension of vector spaces 


The main purpose for which we need Lemma 7.1 is the proof of the following 
theorem: 


THEOREM 21 Lel vi, Vo, . . . , ¥, be elements of a vector space V over a field K. Let 
U be the subspace of V generated by those vectors (see Proposition 5.1). ThendimU < 
1, and equality holds if and only if vi, . . . , v are linearly independent. 


Proof. We show that if s > r, then any s vectors wu, . , usin U 
must be linearly dependent. It then follows from Definition 6.3 that 


dim U <r. 
First of all, according to Proposition 5.1, the subspace U consists of all 
elements 
ea U = aw + ave ++ +a, 
where a), . . . , a, are arbitrary scalars. Hence, givens vectorsw, ... , 


u, in U, each one of them can be expressed in this form, say 
8.2 Ui = Qi¥, + Geive2 + + + Opie @=1,...,8) 


where ay: a3;, ete., are in K. Now if s > r, then by Lemma 7.1 we can 


find elements ¢,, ¢2, . . . , €, in K, not all zero, such that 
ane, + dite + +--+ + aute = 0 
Gaye, + dere + + + + Oaes = 0 


8.3 


Base vectors 183 


But then from (8.2) we get 


Cay of eats = C+ (Quy, + + + + anve) +>: 
Hees (aati For + + Gys¥r) 
2 
+ (Gat, + + + + + Grate) + Ve 
=0 
by (8.3), showing that m, . . . , w, are linearly dependent if s > r. 
If y, ..., ¥, are linearly independent, then it follows at once from 
Definition 6.3 that dim U =r. We now show that ifw,..., v, are 


linearly dependent, then dim U < r, and that will complete the proof of 
the theorem. If the v’s are linearly dependent, then one of them—for 
simplicity of notation say v,—can be expressed as a linear combination of 
the others ¥;, ¥:, --.,¥1. Butthenv, vz, ... , ¥;1 generate the same 
subspace U as do vz, vz, . - . , ¥; (see Exercise 1, Sec. 6). By what has 
just been shown it follows that dim U <r -1 <7. QED. 


EXERCISES 

1. Let U be a subspace of a vector space V of finite dimension. Show that 
dim U < dim V, and show that equality holds if and only if U = V. 

2. Show by an example that it is possible to have a subspace U of a vector 
space V such that U # V but dim U = dimV = @. 

3. What is the dimension of the subspace of K; defined at the end of Exercise 7, 
See. 5? 

+4, Let V be a vector space, and Jet U, and Us be subspaces. Let V’ be the 
subspace of V generated by the elements of both U; and Us (see Exercise 5, Sec. 5), 
and let V” be the set of elements common to both U; and Uy. Assuming that 
V" has finite dimension, prove that dim ¥’ + dim V” = dim U, + dim Up, 


9. Base vectors 


Let V be a vector space of finite dimension x over a field K. The purpose of this 
paragraph is to show that in a certain sense the elements of V can be represented 
by elements of K—or more precisely, by n-tuptes of sealars. This will provide us 
with a very important way of translating vector-space calculations into calcula- 
tions with scalars. 

By Definition 6.3, if dim V = n, then there must exist in V a set of x linearly 
independent vectors, say v;, ¥z, . . . , ¥, (we shall soon see that there are in general 
infinitely many such sets). Now let x be any vector in V. The set x, v1, ..., 
vy, then contains x + 1 vectors, and they must therefore be linearly independent 
(by definition of dimension), say 


COR + av, + Cave Fos tear, =O 


184 Vector spaces and affine spaces Ch. 8, Sec. 9 


where 9, 41, . . - , ¢, are elements of K, not ull zero. Now ¢o cannot be zero, for 
otherwise this equation would imply that v,, ..., ¥, are linearly dependent. 
Hence we can solve the equation for x. Writing «; for —c:/co, we get 


XS vy + Deve bot FeV 


DEFINITION 3.1 Let V be a vector space of finite dimension n over a field K. Then 


any ordered set of n linearly independent vectors v1, vo, . .. , ¥n in V is called a 
base for V. 
We shall often indicate a base », . . . , v, by a symbol such as {v;}. 


From the remarks above we have at once the following theorem. 


THEOREM 9.1 If, v2... , Vn form a base for a vector space V over a field K, then 
any element in V can be expressed in one and only one way as a linear combination 


oa X= ayy, + rev, fs BED 


with a, 2, 0. . ,%, in K. 
Uniqueness follows from Proposition 6.1 


pEFINITION 9.2 The scalars x1, 22... , t, in (9.1) are called the components of 
x relative to the base {vi} = v1, .. vn of Ve 

Theorem 9.1 tells us that once we select a base {v,} in V, then for every vector x 
we get an n-tuple (21, 22, . . . , 2) of scalars, that is, the components of x. The 
operation leading from x to its components is therefore a mapping from V to K,. 
Calling the mapping T, we have then by definition 


a2 T(x) = (my... ta) Hx say to + anv 


THEOREM 9.2 The mapping T from V to K,, defined by (9.2) is a one-to-one mapping, 
and it is compatible with vector addition and scalar multiplication. That is, 


9a Ta +y) =TG)+TQ) and Thex) =e TO) 


for any vectors x and y in V and any scalar ¢ in K. 


V2 Figure 3a 
“i vu ve form a base of Ro. 


(0, 0) 


—2w. 


Figure 3b 


wi, —2w: do not form a base of Re. 


Base vectors 185 


Proof. First of all, if (t, ..., 2,) is any element of K,, then x = 
av, +--+ + 2,¥, isan element of V, by definition of a vector space, and 
for that veetor x we have T(x) = (a1, ... , a»). Therefore, given any 


element of K,, there is at least one element of V which is sent into it by T. 
But T cannot send two different elements x, y of V into the same element 
of K,. For if T(x) = T(y), then x and y must have the same components 
(relative to the given base in V), and so x = y, by Theorem 9.1. 

To complete the proof we must verify the two equations (9.3). Suppose 


then that x = aim + +a,v, and y=yv+--- +4av. Then 

x+ty=(t+y)ent- +: +t. tyn)-¥, and so by definition of T 

we have 

Tx +y) = (tay. + te + Yn) 

By definition [Eq. (4.1), See. 4] the right-hand side is 

Gyo 5) + Ye +. Wn) = THe) + THY) 

The second equation T(ex) = ¢- T(x) is verified similarly. @.B.D. 
corattary Vectors wh, . . . , ws In V are linearly independent if and only if the 
corresponding vectors T(u:), . . . , T(u) in Ky are so. 

Proof. If em + +++ + cu, = 0 for certain scalars q,..., Ce 

(not all zero), then (since T sends 0 in V into 0 in K,) we have 

Tieim +--+ + ¢sts) = 0. Using (9.3) repeatedly, the left-hand side is 

equal to Tim) + > - - +T(csus) = c,- TQ) +--+ +¢,- Tu} = 0, 

and so T(u), . . . , T(u,) are Hinearly dependent, The converse is estab- 


lished by just reversing the argument. 


ReMARK 1. The mapping T: V — K, just discussed is called an isomorphism be- 
cause it establishes a one-to-one correspondence between the two vector spaces 
which preserves vector-space operations. Isomorphisms of vector spaces will be 
discussed in detail in the next chapter. 

Theorem 9.2 shows that V and K, have precisely the same structure as vector 
spaces. Therefore K, can be considered as a kind of model for all n-dimensional 
vector spaces over K, 

Observe that the mapping T of Theorem 9.2 depends in an essential way upon 
the choice of base v;, . . . , v, in V (the effect of a change of base is calculated 
below). There are many possible choices of bases, and therefore the correspond- 
ence is in no way unique. 


Remark 2. Under certain circumstances it is possible to extend the concept of 
base to infinite dimensional spaces. This is of utmost importance for many practi- 
eal applications of vector spaces. The question is discussed briefly in Chap. 14. 


REMARK 3. The vectors, = (1,0,0...,0,e=(01,0...,0,...-, 
e, = (0,0, ..., 0,1) in K, are easily seen to form a base. We shall call it the 
canonical base for K,. If x = (a, ..., #,) is any element of K,, then x = 


186 Vector spaces and affine spaces Ch. 8, See. 9 


we + 2.-@2 + +++ +2,+@,. Thus the scalars, .. . , 2, are precisely the 
components of x relative to the canonical base. In general there is no “preferred” 
base in a vector space, and K, is exceptional in this respect. 


The following theorem is sometimes useful: 


THEOREM 9.3 Let V be a vector space of finite dimension n, and lel vi, v2, . . - , ¥ be 
linearly independent vectors in V. Then there exist n — r vectors Vy41, - + + Yn nV 
such that the setvi, . . . , ¥n 18 a base for V. 

Proof. Ir <n, then there is at least one vector v,4; in V not in the 
subspace spanned by v1, ..., ¥,, by Theorem 8.1. Then vy, ...,. Yr 
¥-41 must be linearly independent, clearly. If r +1 <n, then we can 
find a vector v.42 not in the subspace spanned by ¥;, . . . , ¥r41, and soon. 


(We are of course really using mathematical induction, abbreviated by 
“and so on.””) 
coroitary V being az above, let U be an r-dimensional subspace, with r <n. Then 
there exists another subspace W of V, of dimension n — r, such thal V is the direct 
sum U ® W of U and W (see Definition 5.2). 
Proof. Choose a base vi, ..., ¥, for U, and let v.41... , ¥n be as 
in the theorem, Let W be the subspace of V spanned by the latter. Then 
W has the required properties. 


REMARK. The space W is not unique. 


We conclude this section with a calculation showing the effect of a change of 


base in V upon the components of a vector. Let {vs} =v, .-.,¥, and [¥j} = 
vi,» - - + ¥4 be two different bases in V. Then by Theorem 9.1 any vector in V 
can be expressed in terms of either base, say 
cry Keay ts+ + tay, =civi te tah. 
In particular, the elements ¥;, . . . , ¥, can be so expressed in terms of the old 
base {v;}. Say 
a8 Vj = Gav + ajve + ++ + Aine G=1 +n) 
Substituting this in (9.4) we get. 

a a 

BL uve +o + + GinVa) Hoos tas (Gm¥ + + + + + Gann) 
= Gri ti bata v bo (Ginti Hoo 4 Sante) Vn 

Therefore, by Proposition 6.1, we have 
96 Ei = OT] + Gate + ++ + + Gnide @=1 a) 
which shows how the components (a1, . . . , 2a) and (xi, . . . , 24) of x relative to 


the two bases {vi}, {vj} are related. In Chap. 9 we shall develop some compact 
notation for handling such systems of equations. 


Affine spaces 187 
EXERCISES 

1. Let {v;} =v: .. . , v, be a base for a vector space V, What are the com- 
ponents of the base vectors themselves relative to that base? That is, where does 
T of Theorem $.2 send them? 

2. Exhibit a base for the vector space K, (K a field). Find a base for the space 
K{¢), defined at the end of See. 4. 

3. Find a base for Qo different from the one given in Remark 3, @ being the 
rational field, Write down the Eq. (9.5) connecting the two bases, and compute 
the components of the vector (3, —2) relative to the bases. Verify (9.6) in this 
case, 

4. The vectors (1, 2, 0), (-1, 1, 3), and (0, 2, 4) form a base for @;, Find the 
components of the vectors (2, 4, 2) and (6, 0, 1) relative to that base. 

5. Let vi, ... , Yn be a base for V, and let U; be the subspace of V generated 
by the vector v;(j = 1, ... 7). Prove that V is the direct sumof U;,... Un 

6. Let V be a vector space of finite dimension, and let U, . . . , U, be subspaces 
such that V is their direct sum. Prove that dim V = dim U, + - + dim U,. 

7. Let V be a vector space of finite dimension, and let U; and U2 be two sub- 
spaces. Let V’ be the subspace of V generated by all the elements of U, and 
Us; let V" be the set of elements common to Uy and U2. Prove that dim V’ + 
dim V" = dim U, + dim U;. 

8. Find a base for the complex number field C considered as a vector space over 
the real field R. 

9. Let V be a vector space of dimension 3 over a field K, and let v1, vo, v3 be a 
base. Let (bi, bs, bs) be an element of K; different from 0. Let V’ be the subset 
of V consisting of all vectors x = xiv; + xe¥2 + x3v3 whose components satisfy the 
equation bi; + det, + bsts = 0. Prove that V’ is a subspace of dimension 2. 

*10. Let K be a subfield of a field Z, Show that L, with its given operation of 
addition, is a vector space over K if scalar multiplication ¢ - x of an element ¢ of K 
and an element x of 1 is defined by the product in 1 (see Exercise 4, Sec. 3). The 
dimension of L as vector space over K, if finite, is called the degree of L over K 
and is denoted by [L:K]. Now assume that L is in turn a subfield of a third 
field B such that E as vector space over L has finite dimension, denoted similarly 
by (#:L}. Since K is a subfield of EB, the degree [EZ : K] is also defined. Prove that 
[E:K] =(E:L]-(L:K]. (The notion of degree defined here is of great importance 
in the study of fields.) What is (C:R]? 


10, Affine spaces 


Everyone is familiar with the idea of representing points in the plane by means of 
rectangular coordinates (2, y) or points in space by coordinates (x, y, 2). Here 
z, y, and z stand for real numbers, and so (x, y) is an element of the vector space 
Rs; (z, y, 2) is an element of Rs. Therefore the procedure of choosing coordinates 


188 Veetor spaces and affine spaces Ch. 8, Sec. 10 


can be thought of intuitively as temporarily “pasting” a copy of R» onto the plane, 
or of Rs onto three-dimensional space. In order to achieve a workable formulation 
of these notions and their consequences, we must first give some account of what 
is meant by “the plane” and by “three-dimensional space.” 

Of course our physical insight gives us a rather clear idea of what those terms 
ought to imply. But just as in the case of the system of integers discussed in 
Chap. 2, it is desirable to avoid any reference to physical entities in making defini- 
tions. Therefore we propose to consider the plane and 3-space as abstract mathe- 
matical systems which in some sense reflect our intuitive apprehension of space 
but which are based upon an axiomatic foundation. 

One procedure would be to take as the plane (or 3-space) the systems described 
by the axioms of euclidean geometry. Indeed that is the purpose for which those 
axioms were devised. However it is somewhat simpler for us to base our defini- 
tions of the plane and 8-space upon the real-number system R. In that way it is 
also very easy to define analogous spaces of any number of dimensions. 

One may wonder why we de not simply take the vector space Rs for 3-space, 
and similarly R, and Ry for the line and plane, respectively. That is often done, 
but the procedure suffers from an awkward drawback: For ease of discussion let 
us designate by Es something that fits our intuitive notion of 8-space. Now Rs, 
being a vector space, has an operation of addition. But of course we do not 
envisage any such operation upon the points of E's; nor do we think of E; as having 
a distinguished point as “origin,” like the zero element of Ry. Hence, Rs possesses 
some features which do not agree with our idea of E;. (Similar remarks apply to 
R, and R» as candidates for the line and plane, respectively.) 

Now our idea of what £3 should be does include a certain kind of operation— 
namely, rigid displacement—and it is by means of that notion that we shall be able 
to define E from Rs. 


EFINITION 103 Let A be a set of elements, and lel T(A) be a set of one-to-one map- 
pings of A to itself. Then the system consisting of A and T(A) is called an affine 
space over a given field K if T(A) satisfies the following conditions: 
(1) For each ordered pair of elements p, g in A there is one and only one mapping 
in T(A) which sends p into q. 


(2) T(A) with the binary operation of composition of mappings is an abelian 


group. 
(3) For each element ¢ of the field K and each element t of T(A) there is assigned 
another element ct in T(A). 
(4) T(A), with the operations of (2) and (3) és a vector space over K. 
The dimension of A is defined to be the same as that of the vector space T(A). 
One often speaks of A itself as an affine space, T(A) being understood. But as 
the axioms show, the essential structure required is primarily in T(A). A given 


Affine spaces 189 


After displacement 


Before displacement 


After displacement 


Before ee 


r 


Figure 4 (Top) A rigid displacement of the plane E, can be described intuitively as 
the motion obtained upon regarding E; as a rigid solid and displacing it along the 
indicated arrow. Points p and q signify points in the plane; arrow signifies the rigid 
displacement from p tog. (Bottom) The rigid displacement of space from p to q 
yields the same outcome as the rigid displacement from r to s. 


q 
Figure 5 


Composition of rigid displacements: the result of 
rigidly displacing E, from p to q and then from q to r 
is the rigid displacement from p to r; ef. Eq. (10.2). 


set A may be made into an affine space in many ways by changing the choice 
of T(A). 

By axiom (2) the mappings in 7(A) form a subgroup of the total group—call it 
M(A)—of all one-to-one mappings of A to itself (see Theorem 6.3, Sec. 6, Chap. 1). 
Now M(A) contains the identity mapping of A, and it is the (unique) identity ele- 
ment of M(A). The identity element of the subgroup 7A) is necessarily the same 
as that of M(A), and so T(A) must contain the identity mapping of A. The in- 
verse of an element t of 7A) is just the inverse mapping t~'. 

The elements of A will be called points, and the elements of T(A) will be called 
translations on A. We recall from Chap. 1 that if t; and t, are two translations, 
that is, elements of T(A), then their composition t,t, is defined by tite(p) = ti(te(p)) 


190 Vector spaces and affine spaces Ch, 8, See. 10 


for any point p of A. In general, composition of mappings is not a commutative 
operation, but in axiom (2) above we have assumed that it is so for the mappings 
in T(A). Since we have used + as the notation for the binary operation in a 
veetor space, we shall now write t, +t, for the composition tit: = tet; of two 
translations. Accordingly, the inverse of a mapping t in 7(A) will now be denoted 
by —t instead of t~, and the identity element of T(A), that is, the identity map- 
ping of A, will be denoted by 0. 

If p and g are two points of A, then by axiom (1) there is a unique translation 
t in 7(A) such that t(p) = g. It is sometimes convenient to denote that t by the 
symbol pq; it is called translation from p to g. In particular, the symbol pp always 
represents the identity mapping O of A, for O certainly sends p into p, and by 
axiom (1) it is the only translation that does so. 

For two points p and q of A we have two translations py and gp. The first 
(by definition) sends p into q, and the second sends q into p. Therefore the eompo- 
sition gp + pg sends p > q — p; that is, it sends p into p and is therefore the 
identity mapping O. Thus @ + pg = 0, or 


= = 
10.4 ap = —Pq 


In other words, gp is the inverse mapping of pq. If p, g, and r are any three points 
of A, then py + gF sends p > q —r, and so 


020 pg + gt = BF 

Using (10.1) we get the alternate form of (10.2) 
= 3 3 

10.3 pa — 7G = pr 

For four points p, ¢, 7, 8, 


po =7s ifandonlyif pr=@ 


Assuming the first equality we get from (10.3) pr=pa tq ant+@=@, 
which is the second equation. The converse is proved in the same way. 

If t is any translation in T(A), and p any point of A, then setting ¢ = t(p) we 
havet = pg; q is uniquely determined by t (once p is fixed), and conversely. Hence 
any element of T(A) can be written in the form pd, and p can be arbitrarily chosen 


Figure 6 


o> , > _s, 
PG = 75. Geometrically, pg = 73 if 
and only if these translations have 
the same length and direction. 
‘Thus the interpretation of a vector 
as a translation of affine space puts 
the intuitive notion of “free vector” 
(by contrast with “bound vector’) 
on a firm footing. 


Affine spaces 191 


in advance. We shall often use this fact. From what has just been pointed out 
we have the following theorem: 


THEOREM 10.1 A and T(A) being as in Definition 10.1, let po be a point of A. Then 
the operation that assigns the translation pug to any point q of A is a one-to-one mapping 
of A to T(A). 

The mapping just defined sends po into ops = 0. 


eExampLe 1 Theorem 10.1 suggests a way of constructing an affine space from a 
vector space. Let V be a vector space over a field K. For A let us take the set V 
itself, without the vector-space operations; and for T(A) again take V, this time 
with its vector-space operations. If now x is an element of A(=V), and if v is an 
element of T(A)(=V), then we define the translation of x by v to be the element 
x+yvof A, It is easily verified that the requirements of Definition 10.1 are 
satisfied. 

Hence a vector space can always be regarded as an affine space. But an affine 
space cannot in general be regarded as a vector space. Roughly speaking, an 
affine space can be thought of as a vector space deprived of a fixed “‘origin.” 
Theorem 10.1 shows that if we fix a point in A as origin, then we get a one-to-one 
correspondence between A and the vector space T(A) which sends that point into 
the zero element of T(A). In this way we can transfer questions concerning the 
set A into analogous questions concerning the vector space T(A). 

According to Definition 10.1, dim A = dim 7(A). Since a vector space of 
dimension zero contains only one element, it follows from Theorem 10.1 that an 
affine space of dimension zero consists of a single point. An affine space of dimen- 
sion 1 is called a line; an affine space of dimension 2 is called a plane. 


DEFINITION 10.2 Let A and T(A) be as in Definition 10.1, and let A’ be a subset of A. 
Then A’ is called an affine subspace of A if there is a subspace V of T(A) and a 
point po in A’ such that q is in A’ if and only if pog ts in V. In other words, A’ is 
the set of points obtained by applying to po all the translations in V. 


Figure 7 The affine lines A’ and 8’ are both associated with the vector space V. 


q A 


dim V=1 T(A)=V=T(B) 


192 Vector spaces and affine spaces Ch. 8, See. 10 


Any point p of A is an affine subspace (of dimension zero), corresponding to the 
subspace V of T(A) consisting of the element O alone. And A is an affine subspace 
of itself, obtained by taking V = T(A) in the definition. 

Now let A’ be as in the definition. We claim that every translation in V sends 
A’ onto itself, For let ¢ be any point of A’, and let gt be any translation in V (it 
can be so written, by the remarks preceding Theorem 10.1). We must show that 
ris a point of A’. Now by assumption pog is in V, and therefore so is pag + GF = 
por (using (10.2)], and so r is in A’ by Definition 10.2. 

Now let p be any point of A’. Then pq is in V if and only if q is in A’. For 
pq = pod — pop, by (10.3), and pop is in V by assumption. Since V is a subspace 
of T(A), it follows that pq isin V if and only if poq is in V, and that is so if and only 
if q is in A’, by Definition 10.2, Therefore po in Definition 10.2 can be replaced 
by any other point p of A’. And if we set T(A’) = V, then A’ and T(A’) satisfyt 
Definition 10.1. That is, an affine subspace of A is an affine apace in its own right. 

Let us apply these remarks to the problem of determining the lines in A (i 
affine subspaces of dimension 1). Let po and p, be any two distinct points of A. 
Then pup is a nonzero element of T(A), and all the translations x- pop, (z an 
arbitary scalar in K) form a one-dimensional subspace V of T(A). For each « 
there is a unique point gin A such that po = 2+ pop. According to Definition 
10.2, all such points q form a line L in A, and by the remarks above, T(L) = V. 
We can thus deseribe L by the equation 


10.5 pot = &- Bopr (z an arbitrary scalar in K) 


The correspondence between q and x is one-to-one. That is, the points of L are 
in one-to-one correspondence with the elements of K. If conversely L is a line in 
A according to Definition 10.1, corresponding to some subspace V of T(A), and 
if po and p: are any two points of it, then by what was pointed out above the 
translation pop must be in V. If po = ps, then pop! must be a base for V, since V 
is of dimension 1, by assumption. Therefore every element of V can be expressed 
uniquely in the form x- pyp;, where x is in K, and conversely every element of 
that form is in V. Therefore we arrive again at (10.5). 

In a similar way we can determine the planes in A. Let po, pi, and po be three 
points in A, and assume they do not lie on a line in A. Then an equation of the 
type (10.5), with po in place of g, is impossible, and consequently the two vectors 
pop, and pop? are linearly independent elements of T(A). They span a subspace 
V of T(A) of dimension 2, and the elements of V can be expressed uniquely 
in the form x + Pept + te- pops, where x, and x: are scalars. For any choice 
of x, and x, that translation can be expressed uniquely in the form pog, and con- 
versely if pig is in V, then it can be expressed in the form x» podi + 22- pope 
+The elements of V are translations on the larger set A. Strictly speaking, we should let 


T(A’) denote not V itself but the translations in V restricted to the smaller set A’. This 
distinction is necessary for logical precision, but no harm will come from overlooking it. 


Affine spaces 193 


+2 


Figure 8a 


The three vertices of a triangle in the plane are in- 
P dependent points if the triangle is not degenerate. 


oa 


Figure 8b 
Pe ef Any four points in the plane are dependent, since any 
three vectors in R; are linearly dependent. 
s 

According to Definition 10.2 all points q of A such that 

10.6 Pod = 21+ popi + 2+ pipe (a, ty elements of K) 

forma plane P in A. If on the other hand, P isa plane in A, corresponding to some 
subspace V of T(A), and if po is any point of P, then V has a base consisting of 
two linearly independent vectors pop: and pops, since by assumption V has dimen- 
sion,2. Both p, and p: must also be in P, and every point q of P satisfies Eq. 
(10.6), and conversely if g satisfies (10.6) it must be a point of P. 

Higher dimensional subspaces of A can be described in an analogous way. If 

A has finite dimension n, then A itself can be described in this way. We shall take 


that up in detail presently. We mention that 2 subspace of dimension x — 1 of 
A is sometimes called a hyperplane. 


DEFINITION 10.3 Let A and T(A) be as in Definition 10.1, and let po Pi,» - + 1 Dr 
ber +1 points in A. Then they are said to be independent if the r vectors pop, 
Pos». +» Pope in T(A) are linearly independent, Otherwise the given points are 


said to be dependent. 

It is easy to see that it is immaterial which point is called p, in this definition. 
This follows from the theorem below but can also be proved directly. 

THEOREM 10.2 If Po, py, . . . » pare independent points in the affine space A, then 

there is one and only one affine subspace A’ of dimension r which contains them; and 

no affine subspace of smaller dimension can contain them. If however the points are 

dependent, there is an affine subspace of dimension less than t which contains them. 

Proof. Let V be the subspace of T(A) spanned by the translations 

Dols Pops - ~~» Pop Then clearly dim V =r if the pointa are in- 

dependent, by Definition 10.8. Let A’ be the affine subspace of A ob- 

tained (Definition 10.2) by applying all the translations in V to po. Then 

A’ has dimension r, by Definition 10.1, and it contains py, pi, -. . , Dr 

Now let A” be any affine subspace of A which contains the r + 1 points. 

By Definition 10.2 and the remarks following it, all vectors of the form 

pat, with qin A”, must constitute a subspace U of T(A). Taking ¢ =p:, 


194 Veetor spaces and affine spaces Ch. 8, Sec. 10 


+» Py we find that U must contain the r translations pop, . . 5 
pop; Hence U must contain V, and so either dim U > r or else U = V 
and A’ = A”, Finally, if the points are dependent, then dim V < rand 
sodim A’ <r. QE.D. 


Two points py and p, of A are independent if they are distinct, and they de- 
termine a line in A, as we have seen earlier. If three points po, pu, pz are dependent, 
then they lie on a line, as the theorem shows. They are then said to be collinear. 
If they are independent, then they determine a plane in A, as we have seen above. 
If A has finite dimension n, then A contains a set of n + 1 independent points; 
but more than x + 1 points must be dependent. 

We conclude this section by showing how the points of an affine space A of 
finite dimension n can be represented by coordinates. By definition the veetor 
space T(A) then has dimension x. Let us choose a base for it, say Dupi DoD». . > 
Popa where po is some selected point of A. If q is an arbitrary point of A, then the 
translation Dog can be expressed uniquely in the form 


40.2 Bog = 21+ Por + t2- Dope + + + > + tn° PoP 
where 2, 2, ..., 2, are scalars in K. Combining Theorems 9.2 and 10.1 we 


see that the mapping 
10.8 qd (hi, By oy En) 


is a one-to-one mapping of A to K,. A mapping obtained in this way is called a 
cartesian coordinate system in A with origin po; and xs, . . . , 7, are called the co- 
ordinates of the point ¢. From (10.7) it is clear that the coordinates of po are all 
zero because pops = O. It is obvious that (10.5) and (10.6) are special cases of 


(10.7). 

Let us now determine how two different cartesian coordinate systems in A are 
related. Then let 5081, Sos, . . . , Suse be a new base for T'(A), and let the new 
coordinates of ¢ be 1, Ys» - +) Yee That is, 

40.9 Bog = Yi Bost = ty Bode 
Further, let the new origin have coordinates b,, .. . , b, in the old system. That 


ig, from (10.7), 

s010 pose = by Pepi t+ ++ + + bn DoD 

Subtracting this from (10.7) and taking account of the relation poy — poss = Soq 
we get 

soaa Suh = (61 — bi) Daph - «+ Gn — Ba) - Dope 

We have the translation 309 expressed in terms of two different bases in (10.9) 


and (10.11). As pointed out at the end of Sec. 9, the two bases for T'(A) are re- 
lated by equations of the type (9.5), say 


Affine spaces 195 
3032808) = Gypopl ++ ~~ + ainpopr Gel...” 

Then applying (9.6) to the translation Sq we get 

was or be =a tb ants G=1...,m) 


from (10.9) and (10.11), or 


a= by tou +++ + nts 


10.44 
Bn = dy + int Fo + + Gnade 


A relation of this type connecting (m,, ..., 2.) and (yw, ..-, y.) ig called an 
affine transformation. In Sec. 12 we shall have some specific examples. Observe 
that if the two origins pp and so are the same, then the b’s in (10.14) are all zero. 
On the other hand, if the two bases in T(A) are the same, then (10.14) reduces to 


t=btyy.-., te = ty, 
or 
uy eee Ba) = Oy oe ey Bn) FY os Ge) 


The two coordinate systems are said to be related by a translation. 
Let us now go back to the first coordinate aystem 2, . . . , x, of (10.7). Con- 
sider the one-dimensional subspace of T(A) spanned by the base vector pop). 
Every element in it can be expressed in the form x - pop;, and each such translation 
applied to po gives a point q such that po = 2- pop}. Referring to (10.5), al) 
such points g fill up a line L; in A. That is, L, consists of all points ¢ whose co- 
ordinates 21, ... , 2 @isy - ++ Mp are all zero. The lines Ly, Le, ... 4. Le 
are called the coordinate axes of the particular coordinate system. 
Consider now the subspace V; of T(A) of dimension x — 1 spanned by PoP, 
. ) Dipi-u Pops +++» Paps that is, by all the base vectors except pop; If 
we apply all the translations in V; to the point po, we get an affine subspace Aj, 


Figure 9 The most general affine transformation of affine 3-space carries any set 
of coordinate axes into another. It sends lines into lines and, more generally, it sends 
affine subspaces into affine subspaces. The general affine transformation of 

euclidean space sends a cube into a parallelepiped. 


Ss 


Log 


Sz 


196 Vector spaces and affine spaces Ch. 8, Sec. 10 


by Definition 10.2, and clearly it consists of all points g for which the jth coordinate 
x; is zero. The» hyperplanes Aj, ..., Aj, obtained in this way are called the 
coordinate hyperplanes of the system 21, t2, . - . , %,. Observe that the coordinate 
axis L; consists of all the points common to the 2 — 1 coordinate hyperplanes 
Ape ++, Ay Aly AL 

More generally, if co ¢1, . . . , Cn are elements of K, of which the last » are 
not all zero, then the equation 


wots co tect + + ent, = O 


determines a hyperplane in A as follows: Let V be the (x — 1)-dimensional sub- 


space of T(A) consisting of all translations 2 - Pop: + + + + + aa - pop, such that 
meas yey te + + ety = O 
Let qo be a point whose coordinates m°, ... , 2, satisfy Eq. (10.15), and let 
be a point with coordinates m1, ..., a. We have 

God = pod — pegs = Gi — mpi + + + (en — 20" 


using (10.7). Therefore, referring to (10.16), we see that go} is in the subspace 
V of T(A) if and only if the coordinates 7, . . . , 2, of q also satisfy Eq. (10.15). 
Hence, by Definition 10.2, all points ¢ whose coordinates satisfy (10.15) form a 
hyperplane A’ in A. If we specialize (10.15) by taking ¢; = 1 and all the other 
e’s equal to zero, then A’ is the coordinate hyperplane A}. 

Finally let us see how to represent the points of the affine subspace B of A de- 
termined by r + 1 independent points q, a, ..., g, (see Theorem 10.2), In 
the coordinate system (10.7) let g; have coordinates aj, a, .- 5 @n (J = 9, 

., 1). By definition those scalars are the components of the vector pog) with 
respect to the base Pepi, . . . , papa. Then gg; = Pod) ~ Pogo = (au — aa)popi + 
+ + (@jn — Gon)DoPs. From the proof of Theorem 10.2 it is clear that a point ¢ 
is in B if and only if gj is a linear combination of the independent vectors qui, 


+ Gilley, SAY 
Goh = Ga ts +e Go 
= G+ Man — an) + papi + ++ + + in ~ aon) + Bopel 
If g has coordinates 2, ..., 2% then qog = (1 — aor): Dopi t+: + 


(a» — don) Pape. Equating coefficients of pop; in this and the preceding expres- 
sion we obtain 


teat ty — ayy = SEA ~ aor) ee =1,..., 7) 
ft 


As the ¢, vary through the field of scalars K, the point q with coordinates a, . . ., 
a, given by (10.17) varies through the subspace B, giving a one-to-one mapping of 


Affine spaces 197 


Bto K,. In fact the correspondence q > (h, . . . , ¢,) is just a cartesian coordi- 
nate system in B, We can write (10.17) in a form which is not biased in favor 
of go as follows: Put % =1—¢ —t—--- —#,. Then it is immediate that 
(10.17) is the same as 
10.18 = todo + + ba, 

: Xk = lo Gow 7 Ore loth bee +h <1 

As a special case, let go, ¢1 be two distinct points of A. The subspace of A that 
they determine is a line Z. If go has coordinates a,, . . . , a, and if q, has coordi- 
nates 61, . . . , 6,, then (10.17) becomes 
10.19 Ey — Ay = E (be — Ox) (k=1,..., 7) 


‘These equations are called parametric equations of the line L: as ¢ varies through 
K the point with coordinates (1, . .. , ¢») given by (10.19) varies through L. 
Equation (10.19) is just (10.5) written out in terms of veetor components (with 
some minor changes of notation). If we put t, = t, ft = 1 — fy, then (10.19) goes 
into the form (10.18): 


20.2000 fy = tote + fade (e=1,..., m6 +6 = 


EXERCISES 

i. Let A’ be an affine subspace of an affine space A of finite dimension. Show 
that dim A’ < dim A, with equality holding if and only if A’ = A. 

2. Let A’ and A” be two affine subspaces of an affine space A, Prove that the 
set of points common to both A’ and A” is either empty or else is an affine sub- 
space of A. Prove that two distinct lines in A either have no points in common or 
else intersect in a single point. Prove that two distinct planes in A either have no 
points in common or else intersect in a point or a line. 

3. Let A be an affine space of dimension n. Prove that, if n = 3, two distinet 
planes in A intersect in a line or else have no points in common. Show that, in 
general, two planes in A have no points in common if » > 4. 

4, Let A be an affine plane (that is, dim A = 2), and let x, y be a cartesian 
coordinate system in A. Prove that if a, 6, and ¢ are scalars with a and 6 not both 
zero, then the set of all points in A whose coordinates x, y satisfy the equation 

ax +by te =0 
is a line. Show conversely that every line in A can be obtained in this way by a 
suitable choice of a, b,c. Show how to determine a, 6, ¢ for the unique line passing 
through two distinct points po and p,, with coordinates (ao, yo) and (x, 91), Te- 
spectively. 

5. Let A be an affine space of dimension n, and let x, 2, . . . , 2, bea cartesian 
coordinate system in A. Prove that every hyperplane in A is determined by an 
equation of the type (10.15). Let A’ and A” be affine subspaces of dimensions r 
and r — 1, respectively. Show that there is a hyperplane in A such that A” con- 


198 Vector spaces and affine spaces Ch. 8, Sec. 14 


sista of all the points which that hyperplane has in common with A’. Use this 
result to prove by induction that A’ is the intersection of n — r hyperplanes in A. 

6. Let A be an affine 3-space over the real field R, and let (x, y, 2) be a system of 
cartesian coordinates in A, Determine the line passing through the two points 
(1, 0, 3) and (—1, 2, 4); determine the plane passing through the points (1, 0, 3), 
(-1, 2, 4), and (4, 1, 2). (We mean, determine the conditions that must be satis- 
fied by the coordinates of a point in order for that point to be in the line or plane.) 


11. Euchdean spaces 


In general there is no way of defining lengths of vectors in a veetor space, or angles 
between vectors. Similarly there is in general no way of defining distance between 
two points of an affine space, or of defining the angle between two intersecting lines. 
The axioms for vector and affine spaces simply do not provide any apparatus for 
making such definitions. We remark furthermore that it is quite possible for a 
vector space or an affine space to have only a finite number of points, even though 
the dimension is greater than zero. Indeed if the field of scalars K has only a 
finite number of elements, say r (as is the case with the fields Z, of Chap. 2), then 
any vector or affine space of dimension x over K has just r* elements, as is readily 
seen. It would be unreasonable to hope for a definition of angle or length in such a 
situation. We now turn our attention to a case where such definitions can be 
made—namely, to vector and affine spaces over the real number field R. We begin 
with the following provisional definition; a more complete one is given in Chap. 14. 


DEFINITION 11.1 Lei V be a vector space of finite dimension n over the real field R. 
Then V is called a euclidean vector space if there is assigned to every element x of V 
a real number, denoted by |x|, such that for al least one choice of base ey, .. . , en 
in V we have 


+a2y) if x= mete ++ + ane, 


ma Ix] = GP + 


It is clear that |x| > 0 and |x| = 0 if and only if x = 0. The number |x| is 
called the length of the vector x, Starting with any n-dimensional vector space V 
over R we can make it into a euclidean vector space by simply choosing any base 
in V and then defining |x| by the formula (11.1). We mention that the particular 
form of (11.1) is dictated by the fact that it makes the Pythagorean theorem true, 
as we shall see presently, and that would not be so if we had perversely decided to 
make the right-hand member of (11.1), say, (i! + - - - +2,‘) However, the 
form chosen for (11.1) is not the only one which makes the Pythagorean theorem 
true. 

According to Definition 11.1, the number |x| is assumed to be given, without 
reference to any base in V; it is merely required that there be some base for which 
(11.1) is valid, From it we deduce the equation 


Euclidean spaces 199 
a2 tx? = 2x? (t any real number) 

‘We now define the inner product (x, ¥) of any two elements x and y of V by the 
equation 

13 Ix + yl? = [xf? + 20x, y) + Ly? 


Putting x for y here and using (11.2) we find for example that (x, x) = |x|’. Put 
ting O for y we get (x,O) = 0. Clearly (x, y) = (y, x) for any two vectors. If 


@,..., @, is a base for which (11.1) holds, and if x = me +--+ + ten 
and y = yier +--+ + ¥aen, then it is easily seen from (11.1) and (11.8) that 
m4 Gy) =a bi toate 


From this formula it is clear that 
115 (x,y) =t- (x,y) and = (x +y,2) = (x,z) + (y,2) 


for any vectors x, y, z and any real number t. Two vectors x and y are said to be 
orthogonal if (x,y) = 0. A vector of length 1 is called a und vector. Applying 
(11.4) to the base vectors e; we get at once 

fl itis 
ms He) 1g igs 
Thus if (11-1) is valid for the base e;, then the veetors are all unit vectors and any 
two different vectors in the base are orthogonal, For this reason the base is said 
to be orthonormal. It is easy to see conversely that if a base for V is orthonormal, 
then (11.1) holds for that base. 


THEOREM 111 (Schwarz’s inequality) For any {wo vectors x and y in V, 
a7 x,y < Ix? ly? 


Proof. Replace x in (11.3) by fx, ¢ being any real number. Using 
(11.2) and (11.5), we have léx + yl? = @lx? + 2é(x,y) + |yl?, Tem- 
porarily write a = |x[¢, b = (x, y), andc = yl’. Since |fx + yl? > 0, we 
have then 
af + 2b +e>0 


and this must hold for any real number i, Consequently this quadratic 
polynomial cannot have two distinct real roots, and therefore 6? < ae, 
which is what we wanted to prove (see Exercise 5, See. 6, Chap. 6). QE.D. 


Now let x and y be any two nonzero vectors in our euclidean vector space V. 
Taking square roots in the Schwarz inequality we get 


IY < [xl ly! I(x, y)] = absolute value of (x, y) 


Since |x| ~ 0 and ly| = 0, there follows 


200 Veetor spaces and affine spaces Ch. 8, Sec. tt 


Therefore there is a unique number @ between 0 and 7 (inclusive) such that 


11.8 cos @ = MY), 
Ix] - ly 


We define @ to be the angle between x and y, and we shall also denote it by 


@=Kxuy 


Observe that @ = 7/2 if (x,y) = 0. Therefore our earlier definition of orthogo- 
nality agrees with the definition of angle above. It is easily verified, using (11.2) 
and (11.5), that if a and 6 are any positive numbers, then the angle between ax 
and by is the same as the angle between x and y. Taking x = y in (11.8) we get 
cos # = 1; hence @ = 0. This shows that # has at least some of the properties we 
ought to expect of the definition of angle. Writing (11.8) as (x,y) = |x| - yl cos @ 
and putting this in (11.3) we get 


a9 [xt yl? = |x}? + 2x] -|y| cos @ + [yl 


which is a form of the law of cosines. From it there follows at once the triangle 
inequality 


ao |x te yl < |x] + ly! 


for the right-hand side of (11.9) cannot exceed (|x| + lyl)?. 

Now let x have components z:, .. . , 2, with respect to an orthonormal base 
ey +++, Gy so that x = me: +--+ - +2ne,, Since the components of e; with 
respect to that base are all zero except the jth, which is 1, it follows at once from 
(11.4) and (11.8) that 


11. (x,e;) = 2; = |x| cos a; 


where «; denotes the angle between x and e;. In particular, if x is a uni vector, 
then its components with respect to the orthonormal base are the same as the 
cosines of the angles that x makes with the base vectors. 

We now show how these definitions can be carried over into affine spaces. 


DEFINITION 11.2 An affine space E of finite dimension n over the real field R is called 
@ euclidean space if ifs vector space of translations T(E) has the structure of a euclidean 
vector space. 

It is simply required that there be assigned to every translation ¢ in T(E) a 
length |t| satisfying Definition 11.1; such a length can always be defined. Sup- 
posing then that H is a euclidean space, it is very easy to define the distance |pq| 
between two of its points p and q: we put 


Euclidean spaces 201 


uaz [pg| = [pel 

where on the right stands the length of the translation veetor py. It is clear that 
lpql = |gp|, from (11.12). From (10.2) we have pq + gf = DP, and from (11.9) 
there follows the law of cosines 

aus [prl? = |pgl? + lgrl? — 2lpq| - ler] - cos @ 


where @ is the angle between gp and gr. In particular, if those two vectors are 
orthogonal, then (11.18) reduces to the pythagorean theorem, From (11.10) we 
have the friangle inequality 

ase |prl < |pal + Iae| 

It is also very easy to define the angle between two lines L and L’, but some 
preliminary remarks are necessary. If L is a line in E, then all the translations 
pq (p and q points of L) form the one-dimensional vector space T(L). And since 
that space has dimension 1, it follows that there are exactly two unil vectors in it; 
if one of them is u, then the other is —u. By an oriented line L we shall mean a line 
along with one of the two unit vectors :u. If then L and L’ are two oriented lines, 
with associated unit vectors u and w’, respectively, we define the angle hetween L 
and L’ by 
11.15 £LU= Xu 
the right-hand side being of course defined by (11.8). If the orientations of the two 
lines are reversed (that is, if u, w’ are replaced by —u, —w’), then the angle between 
Land L’ is not changed. If however the orientation of just one of the two lines is 
reversed, then the angle (11.15), call it 6, is replaced by + — @, as follows from 
(11.8). The two lines are said to be paraliel if the angle between them is 0 or =; 
they are said to be orthogonal (or perpendicular) if the angle is /2. In the latter 
case it is clear that the orientation is a matter of indifference. It is possible to 
define angles between affine subspaces of E of dimension greater than 1, but we 
shal] not go into that here. 

We point out that either of the vectors u or —u in T(L) satisfies the conditions 
of Definition 11.1. For if x is any vector in T(L), then x = 2-u for some real 
number x, and |x|? = 2? |u| = 2%, by (11.2), and this is just Eq. (11.1) for a = 1. 
Hence T(L) is a euclidean vector space, and so L itself is a euclidean space, by 
Definition 11.2. It is not hard to show that any affine subspace of the euclidean 
space E is also a euclidean space. For example, let P bea plane in E. Then the 
vector space of translations 7(P) has dimension 2. Let u, be any nonzero vector in 
T(P). By multiplying u, by a suitable number (namely, the reciprocal of its 
length) we obtain a unit vector. Therefore let us assume that uj itself is a unit 
vector in T(P). Since w, does not span T'(P), there is 2 vector w in T(P) which is 
not a multiple of u. Put w’ = w — au, where a = (u,, w). From (11.5) we 
have (uw, W’) = (ui, W—@ uy) = (, ¥) — a (Wu) =a—a=0. Hence 
and w’ are orthogonal, and w’ = 0, for otherwise w would be a multiple of u. Set 


202 Vector spaces and affine spaces Ch. 8, See. 11 


finally up = 6 w’, where 6 = 1/|w’|. It is immediate that u; and wy form an ortho- 
normal base for T(P), which must therefore be a euclidean veetor space. Hence P 
is a euclidean space, by Definition 11.2. A similar procedure can be used for affine 
subspaces of any dimension in E. 

There is an important relation between lines and hyperplanes in E. Let L be a 
line, and let u be one of the two unit vectors in T(L). Let V be the subset of T(E) 
consisting of all translations which are orthogonal to vu. From (11.5) it is easy to 
verify that V is a subspace of T(E). Furthermore, V has dimension n — 1. For 
there exist n — 1 vectors w1,..., Wn1in T(E) such that uw... Wr 
span T(E) (Theorem 9.3). Put vj = w; — @,-u, where a; = (u, w;). Then, as 
remarked above, the vectors v, are all orthogonal to w, and they form a base for 
V, as is quickly seen. Now let p be any point. Then the set of all points ¢ such 
that pod is in V is a hyperplane H in B, by Definition 10.2, and T(H) = V. Every 
element of T(H) is orthogonal to every element of T(L), since the latter is spanned 
by u; and the line determined by any two points of H is orthogonal to L. For 
this reason we say that the affine subspaces L and H are orthogonal. H can be 
described briefly by 
11s -H = set of all points q such that (u, poj) = 0 
H is uniquely determined by L and po. 

Ina similar way, if we start off with a hyperplane Hf in E and let U be the set of 
all translations in T(E) which are orthogonal to every translation in T(H), then U 
is a one-dimensional subspace of T(E) (see Exercise 4 below). Fixing some point 
Po, the set of all points ¢ such that pod is in U is a line L which is orthogonal to H. 

It is not hard to generalize these considerations to affine subspaces of dimension 
rand x — +r in E, but we shall not go into that here. Observe that for n = 3a 
hyperplane is just a plane; for n = 2 a hyperplane is a line. 

We now take up the question of coordinate systems in E. Since T(E) is a eu- 
clidean vector space, there must be an orthonormal base e), es, . . . , €n in T(E), 
by Definition 11.1. Choosing a point po in E as origin, then for any point g the 
vector pod can be expressed uniquely in the form 


mar Pog = TH He Hae, 

which is just (10,7) with minor changes of notation. By the definition of See. 10, 
the mapping ¢ — (x, . . . , 2,) is a cartesian coordinate system in E, We shall 
cal] it a euclidean coordinate system because of the additional assumption that the 
base vectors are orthonormal. If p is a point with coordinates y1, . . . , Yn, that is, 
if Do =mert+ --+ + yee then pg = poy — Pop = i — Meer bees + 


(2 — Yn) + x, and so from (11.12) and (11.1) we get 


ante (pg! = [Grr — wa Fn — any 
It is to be emphasized that this distance formula holds only for euclidean coordinate 
systems and is not valid for other cartesian systems. 

Recall that the jth coordinate axis L, of the given coordinate system is the line 


Euclidean spaces 203 


consisting of all points whose coordinates are all zero except the jth. The vector 
space of translations T(L,) is spanned by e;, and we orient L; by assigning to it that 
unit vector. There is a unique point p;on LE; such that pop) = ej, and so e; is repre- 
sented by the pair of points consisting of the origin pp and the point pj, which is at 
distance 1 from the origin. The jth coordinate hyperplane H,, consists of all points{ 


(ty 24...» %,) such that x; = 0. It is clear that L; and H, are orthogonal. 
Consider now an oriented line J, in E, with associated unit translation u. Let 
the components of u with respect to the base e; be uw, . . . , a, and let a; be the 
angle between L and the axis L;, Then from (11,15) and (11.11) we have 
11.19 uj = (U,e;) = cos a; G=1,...,n) 
These numbers are sometimes called the direction cosines of the line L. 
Let p; and q be two points of L, and let their coordinates be a1, ... , a, and 
Zu +--+ 4 Sn, respectively. Then pig = poy — Dops = i —M)-ertoe- + 


(@_ — Gn) +n. But pig is in the space spanned by the unit vector u, and so pig = 
tu, for some real number #. In fact, from (11,2) it follows that ! = +|pg|. Now 
writing u out in terms of the base vectors and comparing the two expressions for 
pig We get 2; — a; = tu,, or 

mz0 2; = a; + buy Gel...,”) 

These equations are culled parametric equations for L. They are the same as 
(10.19), with b; — a; in place of ; save that in (10.19) the elements 6; — a; may 
not be the components of a unit vector, since it is impossible to define that notion 
in the case of an arbitrary field of scalars. As ¢ varies over all real numbers, the 


point (x1, . . . , 2) defined by (11.20) varies over L. Because of this auxiliary 
role, {is called a parameter. Ifbi, .. . By andm, . .. , 9, are any real numbers, 
of which the last are not all zero, and if we put 

max = by) $v; Gel...,a) 

then the set of all points (x1, . . . , 2.) obtained by letting ¢ vary through R is a 
line Lin B, For put v = ne, + + + + + ey, and let q be the point with coordi- 
nates, ... ,b,, Then (11.20) is equivalent to gi¢ = ¢v, and therefore L’ consists 


of all points g such that a js in the one-dimensional vector space spanned by ¥ 
(of course ¥ may not be a unit vector). 

Returning to L above, let us now consider the hyperplane H that is orthogonal to 
L and contains p,. Referring to (11.16), H consists of all points g such that 
(u, 7:4) = 0, and by (11.4) this is so if and only if 


11.22 Mr — a) tes tte Gn — an) = 0 


where again, . . . , x, are the coordinates of g. Hence Eq. (11.22) determines H 
completely, and it is called the equation of H. Putting wo = —(na + +++ + 
téndn) we can write (11.22) as 


+ In referring to a point it is often convenient to represent it by its coordinates. Thus 
such a phrase as “let (m1, . . . , x4) be a point of Z” means “let us consider the point of 
E whose coordinates are 2... , a.” 


204 Vector spaces and affine spaces Ch. 8, Sec. 11 


23 yp Fue ts eat, = 0 

Conversely, if 90, %, . . . , ¥, are any rea] numbers, of which the last 7 are not all 
zero, then the equation 

we mmm es Ht = O 


determines a hyperplane H’, and it is orthogonal to the line L’ defined by (11.21). 
The argument is essentially that given at the end of Sec. 10 in connection with 
(10.15). 


EXERCISES 

1. Let V be a euclidean vector space of finite dimension n, and let v1, .. - 4 Yn 
be a set of n orthonormal vectors in V. Prove that they form a base for V and that 
(11.1) holds for that base. 


2, Let V be as above and let wi, » Wx be any set of linearly independent 


veetors in V. Define a new set of k vectors vi, . . . , v; by the equations v; = wi, 
Va = We — Gavi, V2 = Ws — GyiVy — GapV2, Ve = We — GuVs — GaVe — Gas, sy 
Ve = We — GeV) — GeeVe — + + + — Gee iVe-1, where the aj; are real numbers. 


Prove that none of the v; can be zero. Show that the numbers a,; can be deter- 
mined uniquely in such a way that the v, are all orthogonal. (This method of ob- 
taining orthogonal vectors is called the Schmidt orthogonalization process.) 

3. Make R; into a euclidean vector space by defining |(x1, 22, 23)| to be (2 + 
ay’ + 23)4. Apply the Schmidt process to the following set of three vectors: 
, 8, 4), (8, —1, 2), (1, 1, 5). Find an orthonormal base for the subspace of Rs 
spanned by (4, 1, -1) and (1, -1, 2). 

4, V being as in Exercise 1, let U be a subspace of dimension r. Show that U isa 
euclidean vector space if the length of a vector in U is taken to be the same as its 
length in V. Let W be the set of all vectors in V which are orthogonal to every 
vector in U. Prove that W is a subspace of dimension » — rand that V = Ue W 
(W is called the orthogonal complement of U in V). 

5. Show that the line defined by (11.21) intersects the plane defined by (11.24) 
in exactly one point. 

6. Let L, L’, L” be three lines in an n-dimensional euclidean space E, with 
L’ and L" parallel. Prove that L’ and L’ can be oriented in such a way that 
<L,L’ = <L, L" for either orientation of L. Show that if L’ and L” are distinet, 
then they have no points in common. Show that if p is a point not on L, then 
there is one and only one line through p which is parallel to L. If n = 2 show that 
two distinet lines are parallel if and only if they have no points in common. 

7. Let p, ¢, r be distinct points in a euclidean n-space E. Prove that the sum of 
the interior angles of the triangle they determine is equal to 7. Prove the law of 
sines for the triangle. 

8. Let P be a plane in a euclidean 3-space Ey. Let q be a point not in P. Prove 
that there is one and only one point p in P for which the distance |pq| is a minimum, 
and show that the line determined by p and q is orthogonal to P. 


Analytic geometry 205 
12. Analytic geometry 


As we have seen in Sec. 10, it is possible to introduce a cartesian coordinate system 
in any affine space of finite dimension. In this way it is possible to translate certain 
types of geometrical questions into algebraic problems, and vice versa. This is 
the whole point of analytic geometry, and we have already encountered some exam- 
ples—for instance in (11.20) to (11.24), dealing with lines and hyperplanes, and in 
some of the exercises. Here we shall go into a little more detail in the special case 
of a two-dimensional euclidean space Ey, Everything to be discussed can be gen- 
eralized without much difficulty to euclidean spaces of higher dimension. However 
it is rather cumbersome to do so without the use of matrices, discussed in Chap. 9. 

According to Sec. 11 we obtain a euelidean coordinate system in B; by selecting a 
point po as origin and an orthonormal base 1, e2 for the vector space of translations 
T(E:). Then for any point g the translation vector poy can be expressed uniquely 
in the form. 


24 poy = re, + yer 


where x and y are real numbers, They are the coordinates of q for the particular 
system, If p is another point, having coordinates x,, #1, then the distance formula 
(11.18) becomes 


2 pal = VG a) + 9? 


Now let p;, pz be two distinet points of E,, and let L be the line they determine. 
For any point ¢ on L the translation vector piq must be in the space spanned by 
pip because T(L) has dimension 1. Therefore pig = {+ pip: for some real number 
1. If the coordinates of pi, p2, q are, respectively, (1, 1), (a2, ¥2), (2, y), then this 
vector equation is the same (2 — 2,)-e: + (y — yi)-e2 = e+ (@2 — m1) + er + 
£+ (ys — ys) +e, from which we get 


z= 2x + Ute — 11) 


123 
ys t ye ~ m1) 


As ¢ varies through R, all the points (x, ¥) obtained from (12.3) form the line L. 
‘These are parametric equations for L, special cases of (10.19) and (11.21). 

Now let u be a unit vector in T(L), say u = ue, + wee, There are two unit 
vectors in T(L), +: u, and we suppose u is the one for which u2 > 0 (or else u, > 0 
if us = 0). We orient L by assigning to it that unit veetor. The direction cosines 
of L [see (11.19)] are 


uy = cosa, = (It, ey) uz = cos ay = (It, e2) 


Since u,2 + us? = 1, we have wu? =1 — uj? =1 — cos’ a: = sin’a;. By assump- 
tion uw 20and0 <a, <m. Hence uw; = sin a. Let us therefore write « for a, 
so that 

uy, = cos a ue = sina 


206 Yeetor spaces and affine spaces Ch, 8, Sec. 12 
Now pip: = ¢- u for some number ¢, and in fact c = + |pips|. Hence 

42d i. — ty = + copa Yo — i =e sine 

Putting s = ef we can write (12.3) in the form 

12.5 r=n +8-cosa yay tes-sina 

If L is not parallel to the y axis, that is, if ~ 1/2, then cosa #0. Set 

12.6 m = tana 


The number m is called the slope of L. From (12.4), m = (v2 — yd/(e2 — a1). 
From (12.5), m = (y — y1)/(e@ — 21). Hence we can get rid of the parameter ¢ in 
(12.5) by writing 


127 yy = me — 2) 
or 
(yo — 41) 
12.8 ay = BIW Ge 
YW 6 1) 


The situation is illustrated in Fig. 10. 

Of course (12.7) and (12.8) do not make sense in the case of a line parallel to the 
y axis (w = 9/2). In that case (12.5) reduces to x = 2; andy = y, +8. Since 
the parameter s can take on all values, so can y. Therefore the line L consists of all 
points satisfying the equation 


12.3 nen 
Observe that (12.7) and (12.9) are special cases of the equation 
12.10 ax +by+e=0 


For that is what (12.7) is if we puta = _m,b = —1,¢ = 1 — mxy; and (12.9) is 
just (12.10) with @ = 1, =0,¢ = —x. Hence the equation of any line in B2 


Figure 10 Y¥ 


Analytic geometry 207 


can be put in the form (12.10). Conversely, if a, 6 are not both zero, then (12.10) 
is the equation of a line. For if 6 = 0, then it can be written 


c a 
tytn 
which is the same as (12.7) with m = —a/b, x, = 0, and y= —c/b. Ifb = 0, 
then (12.10) becomes 2 = —e/a, which is the same as (12.9) with x = —c/a. 


Equation (12.10) is (10.15) for n = 2. 

The equations above contain all the information that one could possibly need to 
know about lines in #2, and we shall therefore leave the subject of lines with the 
following example: Find the equation of the line through (2, 1) and (4, 5). Putting 
these numbers in (12.8) we get y — 1 = (x — 2)(5 — 1)/(4 — 2), ory = 22 4 3. 
The slope here is 2, and the angle « is approximately 63°. We have tan? a 
sin? a/(1 — sin’ a) = 4, or 5 sin?a@ = 4, whence sin « = 2/5 and cosa = 1/ 
The parametric equations (12.5) are 


ga2tts/v5 yal + 20/Vv5 
Putting ¢ = 0 gives the first point (2, 1); putting s = 2/5 gives the second point 


(4, 5); and 2V55 is the distance between them. Since tan (a + 7/2) = —1/tan a, 
the line perpendicular to the given line and passing through (a, 4) has equation 


y—b = —@ - a) 


by (12.7). 

Lines in EF; correspond to Eqs. (12.10) of the first degree. We now consider 
briefly some problems leading to second-degree equations. First of all let go be a 
point of E> and let a be a positive number. The locus C consisting of all points ¢ 
such that |gog| = a is by definition a circle of radius @ and center go. If go has co- 
ordinates x9, yo and if g has coordinates x, y, then from (12.2) q is on C if and only if 


wat (e — 0) + (y — Yo)* = a* 
This equation therefore determines C completely. We point out that any equa- 
tion of the type 
ve et+drt+yY+eytf=0 
can be put in the form (12.14) by simply adding (d/2)? + (¢/2)? — f to both sides. 
The result is 

( +d/2y' + (y + e/2) = (d/2)? + (¢/2)° — f 
The left-hand side cannot be negative, and so there can be no points satisfying this 
equation unless (d/2)? + (e/2)* —f > 0. If that is the case, then (12.12) is the 
equation of a circle of radius [(d/2)? + (e/2)? — f]! and center (—d/2, —e/2). 


t The set of all points satisfying some specified conditions is sometimes called the locus 
satisfying those conditions. 


208 Yeetor spaces and affine spaces. Ch. 8, See. 12 
Figure 11 


Now let us take up a somewhat more complicated problem. Let q, g2 be two 
points of E:, and consider the locus of all points ¢ such that 


was lqigl + lqagl = 2h 


where h is some positive number. From the triangle inequality (11.14) it is clear 
that there are no such points q unless 2h > lgwol. If 2h = lqigal, then it is easily 
seen that the locus consists of all points between g; and 2 on the line joining them. 
We shall suppose that 2h > |gusl. For simplicity let us assume that 9: and qz are 
on the x axis, equidistant from the origin. Their coordinates will then be (c, 0), 
(-c,0), where c>0. The situation is illustrated in Fig. 11. We have 
laigel = 2c, and we assume that 2h > 2c, or h >. If q has coordinates (x, 9), 
then (12.13) becomes [(x — 0)? +- 92 + ((@ +e)? + y°P = 2h. To simplify this, 
square both sides and transpose to obtain 


@ e+e + ate ty — 4 
= Ae — of +P Me +e? + #7) 

Square again and collect terms. The result is 
wae (RE — ea? + ty? = RE — 8) 
We have assumed that hk > c, and so #? > c. Put 
ms k= VP =O 
Then (12.14) is 
was Rx? ye = WKY 
Dividing by 2 we can write this in the so-called normal form 


gy 


ptpol 


war 
The locus in question is of course called an ellipse, and the points on it are precisely 
the points whose coordinates satisfy (12.17). Putting y = 0 in it we get 2 = £h. 
Hence the locus cuts the x axis at (+h, 0). Similarly it cuts the y axis at (0, £4). 
The points q, and qo are called the foci of the ellipse, and h, k are its semimajor and 
semiminor axes, respectively. 

Suppose now that instead of the sum of the two distances in (12.18) we take their 
difference—or more precisely its absolute value: 


Analytic geometry 209 
azas— |iqgl — |gegl| = 2h 


First of all it is clear from the triangle inequality that there are no points g satisfy- 
ing this condition unlesa 2k < 2c, or k < ¢, Going through the calculations analo- 
gous to those for (12.18) we again arrive at (12.14), but now we have c’ > h?. We 
shall assume that ¢ > k > 0 (the excluded cases are easy to analyze). This time 
we put 


12.18 keve—# 
Then (12.14) becomes 
12.20 Kea? — hey? = 12k? 


Since we have assumed h > 0, k > 0, we can divide through by /’#* to put this in 
the normal form 

2 yt 
wn © -¥ Ly 


The locus is called 2 hyperbola and is shown in Fig. 12. The points m, q are called 
its foet. Putting y = 0 in (12.21) we see that the locus intersects the x axis at 
(4h, 0). However it does not intersect the y axis. The two lines y = (k/h)z and 
y = ~(k/h)x shown in Fig. 12 are called the asympiotes of the hyperbola. If we 
write (12.21) in the form (@/h — y/k)- (a/k + y/k) = 1, then for large |z| one of 


Figure 12 {” 


210 Vector spaces and affine spaces Ch. 8, Sec. 12 
Figure 13 


the two factors must be large. Hence the other one must be small. If for example 
the first factor is small, then y is approximately equal to (k/h)z. This shows 
roughly that the hyperbola is approximated by its asymptotes for large values of |z|. 
Finally let us consider the locus consisting of all points g such that the distance 
from g to given point qy is equal to the distance from q to a given line L (ie., the 
perpendicular distance). Let us take go to be the point (0, ¢) on the y axis, and let 
us take ZL to be the line y = —e parallel to the x axis, as shown in Fig. 18. The 
condition here on the point (x, y) is [2" + (y — c)’}} = y +c. Squaring we get 


12.22 w= dey 


The locus is called a parabola, the point (0, c) is called its focus, and the line L is 
called its directrix. 
Both (12.17) and (12.21) are special cases of equations of the type 


12.23 ax? + by? =c 


Conversely, with certain simple exceptions, this equation can be put in one of those 
two forms. We assume of course that @ and } are not both zero here, If one of 
them is zero, say 6, then the equation is x? = c/a. There are no points (x, y) satis- 
fying this equation if c/a <0. If¢/a > 0, then the equation can be written x = 
+: Ve/a, and its locus therefore consists of two vertical lines (coinciding if ¢ = 0). 
Similar remarks apply if 2 = 0, ) #0. Let us now assume that a, 6 ure both 
different from zero. Consider first the case e = 0, so that our equation is ax? + 
by’ = 0. Efaand have the same sign, then the only point satisfying the equation 
is (0, 0). Ifa and 6 have opposite signs, say a > 0 and b < 0, then writing 5” = 
—b, we can put the equation in the form az? — b’y? = (Wax + vb’y)( War — 
Vb'y) = 0. A point (a, y) satisfies this equation if and only if it is on-one of the 
two lines y = a/b’. or y = —Va/b’- x. Hence, if one or more of the three 
numbers a, b, ¢ in (12.23) is zero, then either there are no points (x, y) which satisfy 
the equation, or else (0, 0) ig the only point which satisfies it, or else the equation has a 
locus consisting of two lines, possibly coincident. 


Analytic geometry ail 


Let us now assume that a, b, ¢ are all different from zero. Then putting a’ = a/e 
and b’ = b/e, we get a’x* + b'y? = 1. Ifa’, 8’ are both negative, then there is no 
point (2, y) which satisfies the equation. If a’, b’ are both positive, then setting 
h =1/V@’, k = 1/V® we get an equation of the type (12.17), and the locus is 
an ellipse. Ifa’ > 0,6’ < 0, then putting h = 1/Va’, k = 1/~—8' we obtain an 
equation of the form (12.21), and the locus is a hyperbola, If a’ < 0,’ > 0 then 
we get a hyperbola, with the # and y axes interchanged. 

A locus of an equation of the type (12.22) or (12.23) is called a conic section, 
except for the case of a locus consisting of two distinct parallel tines. Now both 
(12.22) and (12.23) are special cases of the general equation of second degree in x, y: 


12240 az? + Qbry t+eyrtdzrtey + f= 


[We assume that a, 6, ¢ are not all zero here, for otherwise the equation reduces to 
one of the type (12.10).] The high point of many texts on analytic geometry is 
the observation that (12.24) can be put either in the form (12.22) or (12.28) by 
means of a suitable change of coordinate system in E;. Consequently the locus of 
an equation of the type (12.24), unless empty, is always a conic section or else a 
pair of parallel lines. 

Firat of all, if 6 = 0 in (12.24), the result can be achieved by means of a trans- 
lation. For suppose that a, ¢ are both different from zero. Adding d*/4a + ¢/4e 
to both sides we obtain 


a(x + d/2a)'+ ey + e/2c)? = di/4a + e/4e — f 


If we introduce new coordinates x’, y’ in Ey by putting 2’ = x + d/2a,y’ =y + 
e/2e, which corresponds to choosing (d/2a, e/2c) as a new origin, then our equation 
becomes 


az’ + cy" =9 


where g = d?/4a + ¢?/4e — f, and this is indeed of the type (12.23). 
If one of the numbers a, ¢ is also zero, say ¢ = 0, then by adding d!/4a to both 
sides of (12.24) we get 


az + d/2a)* =f ~ ey + @/4a 


Puta’ =2+4+d/2aandy’ = y —f/e — d?/dae. Then our equation can be written 
pao fy 
2x af 


which is of the type (12.22). Here we have simply translated coordinates by taking 
the point (d/2a, —fe — d*/4ae) as the new origin. Of course a similar reduction is 
possible if a = 0, ¢ #0. 

Thus it is a simple matter to deal with (12.23) if the offending term 2bzy is 
absent. We shall now see how to make it disappear. Let us choose a new euclidean 


212 Vector spaces and affine spaces Ch, 8, See. 12 


coordinate system (2’, y’) in Es, with the same origin py as in (12.1), To do so we 
must select another pair of orthonormal vectors ej, e) in the space of translations 
T(E). If q is any point of E», then poy can be expressed uniquely in the form 


3225 Pog = ee ty’ eb 


and <’, y’ are the new coordinates of g. Now both ej, e; ean similarly be expressed 
as tinear combinations of e1, es, say 


e1 = a1-e + a2: 
en = bye: + ba ee 


By assumption (ej, ef) = 1, (el, es) = 0, and (ej, ef) = 1. By (11.4) there fol- 
lows a, + as? = 1, by? + by? = 1, aid: + andy = 0. From this it is easily seen 
that abe — ab; #0, We shall assume that e{ and e} are so numbered that 
aibo — axb, > 0. Now from the foregoing relations on a1, az, bs, by and this last 
assumption it follows easily that be = a, and b; = —a2. To simplify notation let 
us put ¢; = wand a, =v. Then we have 


€1 + vee 
= —re, + uey 


12.26 (+P =1) 


eo 


Since u = (e:, ej), we have u = cos 6, where @ is the angle between e and ej. 
Hence v = + sin@. (Recall that by definition 0 < @ < m, and so sin# 3 0.) 
Define ¢ as follows: 


e=9 ifv20 and g=2r-0 ifr<o 
Then 
u = cosy v=sing 


We say therefore that the new coordinate system (z’, y’) is obtained from the origi- 
na} one by rotation through the angle g. 
Substituting (12.26) in (12.25) and comparing with (12.1) we get 


= ux’ — vy" 


12.27 
= or + uy’ 


These are special cases of the general Eqs. (10.14). 
Now set 
sae Q = ax? + 2bry + cy" 
which is the quadratic part of (12.24). Substituting (12.27) in this we get 
3229 Q = a'r? 4 2b'x'y’ Fe'y® 


where 


—aup + b08 — P) + cw 


Analytic geometry 218 
and similar expressions for a’, ec’. Assuming that b # 0 we can make b’ = 0 by 
choosing x and ¢ so that uw — & = [(a —¢)/bjux, or cost y — sin? y = 
{(a — €)/d) sin y cos g, or 


-—c¢ 
1231 cot 2p = ad 


That being done it follows that (12.24) in the new coordinate system is of the form 
wae? toy? 4d’ fey +f = 0 
As pointed out above this can be reduced by a translation to one of the forms 


(12.23) or (12.22), or elae (12.22) with the coordinates interchanged. 
It is not hard to verify that 


12.38 ac — BF = a’e’ ~ 6? 


in (12.28) and (12.29), for any choice of gy. This quantity is called the discriminant 
of Q, and @ itself is called a quadratic form. If g is chosen according to (12.31), 
then b’ = 0, and so ae — b* = a’c'. It follows that the locus of (12.24) ean be a 
parabola only if the discriminant is zero, an ellipse only if the discriminant is 
positive, and a hyperbola only if the discriminant is negative. 


exampLe 1 The most severe cuse of the presence of an xy term in (12.24) is the 
equation xy = 1. Herea = ¢ = 0,6 = 14. According to (12.81) we take cot 2p = 
0, and so gy = 7/4. Then cos g = sin g = 1/2. Equation (12.27) becomes 
x= (a —y)/v2, y = (2 +y)/v2. Substituting these in zy =1 we get 
2/2 — y%/2 = 1; the locus is a hyperbola, with h and & of (12.21) equal to v2. 
By (12.19),¢ = VA? + = 2; the asymptotes are the lines y’ = +2’. Going back 
to the original coordinate system we see that the foci are at (V2, V2) and (— V2, 


— 2). The asymptotes are the x and y axes. The discriminant is —}4. 


exampce 2 Find the locus of the equation 21z° — 10V3xy + 3ly? + (84 + 
40V3)2 — (248 + 20VB)y = —886 — 8013. Herea = 21,5 = —5V3,¢ = 31. 
The discriminant is 576, and so, if there is a locus, it must be an ellipse. Using 
(12.31) we determine y by 


Vizsin gy = 


whence 2¢ = 7/3, = 7/6, and so cosy = u 


= 44. The rota- 
tion (12.27) is then x = (W3$)2’ — (14)y' and y = (Mayz’ + (VS 


’,  Put- 


Figure 14 


An affine transformation of Ee 
carries any conic section into a conic 
section, thus sending a circle into an 
ellipse. 


214 Vector spaces and affine spaces Ch. 8, See. 12 


ting these in the given equation we get 16x + 36y + 82 (VB — 2)2' — 
72(1 + 2V8)y’ = —436 — 80V3. By choosing a new origin we can get rid of 
the first-degree terms here. Namely, put 2” = 2’ + V3 — 2andy” =y’ —1—- 
2/3. Then the equation becomes 


162’? 4 36y’? = 144 or 


which is the equation of an ellipse with h = 8, k = 2,c = V5, in the notation of 
(12.15) and (12.17). The foci are at (+ V5, 0) in the x”, y” system, and the ellipse 
cuts the axes at (+3, +2). Using the equations above we can easily find the 
coordinates of these points in the original z, y system. For example, using 


2 = (V3/2)0" — VE 42) - ONG" +1 +203) 


and putting in the x", y” coordinates of the foci, namely, x = +5, y” = 0, we 
get « = (44 +15 — 4) as the x coordinates of the foci. 

To summarize, the problem of reducing (12.24) to a more agreeable form by 
means of a change of coordinate systems involves first 4 rotation, in order to banish 
the zy term, and then a translation in order to get rid of the first-degree terms 
(only one of them can be eliminated in general if the discriminant is zero). Prob- 
lems of a similar nature atise in spaces of dimension greater than 2, and then rota- 
tions become quite complicated. In Chap. 14 we shall develop some methods 
which make it easy to deal with rotations in any number of dimensions. 

We mention that there is no good reason for stopping with quadratic equations 
of the type (12.24). The problem of classifying the loci of equations of higher de- 
gree leads to some extremely interesting questions. 


EXERCISES 


1. Find the point on the line 2z + 3y — 1 = 0 which is closest to the point (2, 7). 
2, Derive a formula which gives the distance from a point (xo, yo) to the nearest 
point on the line ax + by +¢ = 0. (Here x, y are understood to he euclidean 
coordinates in Ep.) 
3. Find the points at which the line y = 42 + 1 intersects the locus y’ — 427 = 
16. Deseribe the latter locus, and draw an accurate diagram of it and the line. 
4. Describe the following three loci and draw pietures of them, showing the 
coordinate axes and the foci. 
(a) 22° — xy + dy? + 8x — 25 
(b) 4a? — 122y + 9? +32 ty =2 
(ec) 4a? — 202y + 94° 432 +9 
5. Let x, 2, 23 be euctidean coordinates in a three-dimensional euclidean space 
E;. Let C be the locus of the equation 
ap tae —2f = 0 


u 
i 


Analytic geometry 215 


Prove that the line through the origin and any other point of C lies entirely in C. 

*6. With the notation of Exercise 5, let E2 be any plane in E;, Let 7, y be a 
system of euclidean coordinates in Hz. Prove that the intersection of Ey with the 
cone C consists of al] points in Hy whose coordinates x, y satisfy an equation of the 
type (12.24). 


Linear transformations and matrices 


1. Introduction 


Most applications of vector spaces involve certain kinds of mappings which assign 
to each vector of one vector space U another vector in another (or possibly the 
same) space V. The mappings of interest in this connection are called linear 
mappings (or linear transformations, or operators). In this chapter we shall find 
out how to obtain all such mappings for finite-dimensional spaces and how to 
caleulate with them. 


2. A notational convention 


Let V be a vector space of finite dimension » over a field K, and let v1, . . . , ¥n 
be a set of buse vectors in V. That is, the v, are any linearly independent vectors 
in V. Then, as we have seen in Chap. 8 (Theorem 9.1), any vector x in V can be 
written uniquely as atinear combination x = 2,¥, + - - > + 2,¥2, with coefficients 
a; in K. Recall that the z; are called the components of x relative to the base {vi}. 
Now we are going to adopt the following rule: Indices on the components 2, will 
hencefortht be written as superseripis rather than subscripts: x instead of 2. 
(Thus x‘ does noi mean zx raised to the ith power; it means the 7th one of the 
components z', 2, ..., 2"!) Accordingly we now agree to write 


Kam pee fay, 


or 


x= Voev 


A considerable saving of space is achieved by simply omitting the summation 
sign, writing just 


Pxy x= ay 


{ Some exceptions to this will be pointed out later. Namely, it is sometimes convenient to 
write the indices on the base vectors as superscripts, rather than those on the components 
(see See. 4). 


Linear mappings 217 


and understanding that the expression is to be summed over all relevant values of 
the index i—in this case 1 up to n, This abbreviated notation was introduced by 
Einstein; the understanding that the expression is to be summed is called the 
Einstein summation convention. Its use and advantages will become clearer as we 
go along. Presently we shalt encounter symbols with several indices, and the sum- 
mation convention will be applied to them also. The generat form of the summa- 
tion convention is as follows: In any “product” of symbols with upper and lower 
indices, summation over an index is understood if that index appears exacily once as a 
superscript and eactly once as a subscript; olherwise summation is not understood. 
For example, the symbol a’,b‘efd* stands for 


> > aiixbieid® 


siest 


assuming that the indices range from 1 to x. If it is desired to indicate summation 
over & or h also, then that must be done in the usual way. 

As a final remark here, we note, referring to (2.1), that the symbols xv;, 2v;, 
a'v,, z'v,, ete., all mean exactly the same thing, namely, z'vy) + - + - + 2"va. 
Thus indices which are summed can be changed at will without altering the meaning 
of the symbol—provided of course that one does not infringe upon the rights of 
another index already present. 


3, Linear mappings 
The fundamental definition is the following: 


DEFINITION 34 A mapping T: U — V of two vector spaces over a field K is called a 
linear mapping if 
GQ) Ta +y)=TH) +7) for anyxandyinU 
(2) Tax) =a@-T(x) for any x in U and any a in K 
In the first of these two equations, T(x) and T(y) stand for vectors in the second 
space V, and the right-hand side of (1) means the sum of those two elements for 
the + operation in V. In the left-hand side of (1), x + y denotes the sum of x 
and y with the + operation in U. Hence Eq. (1) says that T is compatible with 
the + operations in U and V. In other words, T is a homomorphism of the abelian 
groups U and V [ef. Definition 7.2, Chap. 1, and Theorem 7.1, Eq. (8.14)), and 
consequently T sends the zero of U into the zero of V and it sends —x into the 
inverse —T(x) of T(x). That is, T(0) = @ and T(—x) = —T(x). Ina similar way, 
Eq. (2) says that T is compatible with scalar multiplication. For ax on the Jeft 
is scalar multiplication in U of x by a; and a - T(x) on the right is sealar multiplica- 
tion in V of T(x) by a. 
Supposing still that T satisfies (1) and (2) above, let x and y be two veetors in U, 
and let a, 6 be two sealars in K. By (1) we have T(ax + by) = T(ax) + Tiby). 


218 Linear transformations and matrices Ch. 9, See. $ 
Applying (2) to both of these terms we get 
ua Tax + by) =a-T(x) $2-TYH) 


A very simple argument by mathematical induction allows us to deduce the more 
general formula 


BD Tax, +--+ +ax%) =a-Tay) +--+ +a7- Thx) 


for any vectors x, ... ,%,in U and sealarsa',a?, .. . ,a’in K. Equation (3.2) 
will be used constantly in this chapter. Using the Einstein summation convention 
we can put it in the shorter form 


a3 Tia'x,) = a! - TO) 

exameces K,, being the vector space defined in Sec. 4, Chap. 8, for any field K, 
let T: K, > K be defined by T(x) = 2, for any vector x = (m,..., 29) in Ky. 
Then T is linear, as is very easily seen, More generally, if ki, ks, . . - , km are any 
integers between 1 and (inclusive), then the mapping 7: K, > K,, defined by 
TR) = Gry eg 6 + + Sry) for any x = (2), 22... , 22) is linear. The following 


mapping S: K» — Kz is linear: let,a, 5, c, d be any four scalars, and for any x = 
(x1, £2) put S(x) = y = Qh, yo), where y, = az, + bz, and yo = cx, + dz. The 
mapping T of Theorem 9.2, Chap. 8, is linear. 


DEFINITION 32 Let T: U — V be as in Definition 3.1. Then the kernel of T, abbre- 
viated Ker T, és the set of all vectors in U that are mapped by T into the zero element 
of V. The image of T, abbreviated Im T, is the set of all vectors in V that are images 
under T of vectors in U. 

Hence an element x of U is in Ker T if and only if T(x) = 0. An element y of V 
is in Im T if and only if there is a vector x in U such that T(x) = y. In other 
words, as x runs through all the elements of U, T(x) runs through all the elements 
of Im T. In the first example T: K, — K mentioned above, Ker T consists of all 
vectors x of the form x = (0, 2, . . . , ,) and those vectors form a subspace of 
K, of dimension n — 1. In the same example, Im T = K, because every element 
of K is the image under T of some x in K,. 


DEFINITION 3.3 A linear mapping T: U — V is called an epimorphism if Im T = 
V, or in other words, if T maps U onio all of V. T¥ is called @ monomorphism if no 
two different elements of U are sent by T into the same element of V. T is called an 
isomorphism if it is both a monomorphism and an epimorphism—that is, if T is a 
one-to-one mapping. 

‘The “component” mapping T of Theorem 9.2, Chap. 8 is an isomorphism V — 
K,. The identity mapping V > V of any vector space is an isomorphism. We 
shall usually denote that mapping by I. Thus I(x) =x for any x in V. The 
mapping T: K, — K discussed in the example above is an epimorphism; it is not 
a monomorphism exeept for x = 1. 


Linear mappings 219 


A linear mapping T: V — V of a space to itself is often called an endomorphism; 
an isomorphism V — V is also called an automorphism. 

The following simple theorem will be used frequently. 

Proposition 31 Let T: U > V be a linear mapping of two vector spaces over a field 
K, Then Ker T is a subspace of U, and T is a monomorphism if and only if Ker T 
is the subspace consiting of the zero element of U alone. Im T is a subspace of V; and, 
more generally, T maps every subspace of U onto a subspace of V. If T is an isomor- 
phism, then the inverse mapping T-!: V — U is also an isomorphism. 

The theorem follows very easily from the definitions, and its proof is left as an 
exercise. But let us show here, for example, that T maps any subspace U’ of U 
onto a subspace of V (see Definition 5.1, Chap. 8), To do so let V’ be the subset 
of V consisting of all vectors T(x), where x runs through U’. Take any two ele- 
ments of V’, They can be written as T(x) and T(y), say, where x, y arein U', We 
want to show that their sum T(x) + T(y) is also in V’. By Definition 3.1 the last 
expression is equal to T(x + y), and this is again an element of V’, by definition of 
the latter, because x + y is an element of U’ (since U’ is a subspace of U). Simi- 
larly, the product of any element of V’ by a scalar ¢ in K is again in V’. Write 
the given element of V’ as T(x), with xin U’, Then c+ T(x) = T(ex), by Defini- 
tion 3.1, and this is in V’ because ex is in U’. 
permetion 34 fT; UV is a linear mapping, U having finite dimension, then 
the dimension of the subspace Ker T of U is called the nullity of T. The dimension of 
the subspace Im T of V is called the rank of T. 
exampce The mapping T; Ky > K> defined by T(z1, 20, 5, 21) = (21, 2») for any 
(x1, 22, Zs, 24) in Ky is easily seen to be linear. Its kernel is plainly the set of all 
vectors of the type (0, 0, ats, x1) in Ky, and so the nullity of Tis 2. Clearly T maps 
K, onto all of K2, hence is an epimorphism. The rank of T is 2. 

The mapping S: K, — K, defined by S(a,, x2) = (0, 21, x2, 0) is linear, It isa 
monomorphism (hence of nullity zero), and its image is 4 two-dimensional subspace 
of Ky. Thus the rank of S is 2. 
queorem 3.2 Lei T: U — V be a linear mapping of two vector spaces over a field K, 
and suppose that U has finite dimension. Then 


dim U = rank of T + nullity of T 
dim Im T + dim Ker T 


Proof. To fix notation, let dim U = 2, let rank = r and nullity = s. 


We want to show thatn =r +s. To do so select a base uy, .. . , u, for 
Ker T. By Theorem 9.3, Chap. 8, this set can be enlarged to a base for U 
hy adding on m — s new vectors ¥1, -- - , Vas. Now let y be in Im T. 


By definition there is a vector x in U such that T(x) = y. Writing x in 
terms of our base in U we have, say, 


Kab fatty $b Fe pO 


220 Linear transformations and matrices Ch. 9, Sec. ¥ 


Applying T to x and recalling that T(u:) = 0, ... , Tu.) = 0, we get 


Tix) = y = TO) + oe + OP TEs) 

Therefore the n —s vectors T(v,), ... , T(va_.) span Im T. They are 
moreover linearly independent. For ife!-T(vi) + -- - + ¢"-*- Ty) = 
0 for certain scalars c', .. . , c"~4, then T(elv; + ++ + te? yn.) = 0, 
and therefore c'vy, + -- + + c"~‘v,_, isin Ker T, hence can be expressed 
as a linear combination of m, ... , u. But that would imply that the 
vectors U1, ... Us V1... , Yas are linearly dependent, contrary to 
hypothesis, unless the c/ are all zero. Hence T(v.), - - + . T(v.-.) forma 


base for Im T, and sor =2 —8. QED. 


coronary Lei T: U—V be a linear mapping of two vector spaces of the same 
finite dimension n. If T is a monomorphism (that is, Ker T = 0), then it must be an 
isomorphism. 
Proof. By assumption the nullity is zero, and so dim U = dim Im T. 
Therefore Im T is an n-dimensional subspace of the n-dimensional space V, 
whence Im T = V (cf. Exercise 1, Sec. 8, Chap. 8). 


Remark. This corollary is not true for spaces of infinite dimension. For example, 


let T be the mapping of K.,, to itself defined as follows: Tm, 22 73, Ty. - -) = 
(0, a, 2% 2s, 24,» . .). Then T is obviously a monomorphism but not an epimor- 
phism. 

EXERCISES 


1. Prove that a linear mapping T: U — V is a monomorphism if and only if T 
sends every finite set of linearly independent vectors in U into such a set in V. 

2, Let C[t] be the vector space of polynomials in an indeterminate i over the 
complex field (cf. See. 4, Chap. 8). Let D; C[é] + C(t] be the mapping defined by 
the formula 

Dao + ait + axl? + - + + +ajt") = ay + 2at + Bag? +--+ 4+ rat! 
(The mapping D will occur later; we shall call it the derivative mapping.) Show 
that D is a linear mapping. Letting C[d, denote the subspace consisting of poly- 
nomials of degree <n, it is clear that D maps Ct], into itself. Let Dp denote the 
operation D restricted to the subspace C[t],. Find the image and kernel of Dp and 
its rank and nullity. Work the same problem with C replaced by the finite field 
Z, (see See. 10, Chap. 2), taking n = 7 and n = 10. 

3. Let T: UV and S: V — W be linear mappings of the indicated vector 
spaces. Prove that the composition S oT, mapping U to W, is a linear mapping. 
Prove that if S and T are monomorphisms; then so is S eT. Prove that if S and T 
ure epimorphisms, then so is S o T. 

4, Let T: Q; — Q; be the mapping defined as follows: If x = (a, 22, 24), then 
T(x) = y = Wu Ys Ys), Where y1 = 3a, — 2xy + 25 Yo = 2x) + ze — Zs, and ys = 


Operations on linear mappings 221 


2a, — 6x2 + 423. Prove that T is an endomorphism of Q;, and determine bases 
for its kernel and image. Find the rank and nullity of T. 

5. Let T: U — V be a linear mapping of two vector spaces, and let V’ be a 
subspace of V. Let U’ consist of all elements of U which are mapped by T into V’. 
Prove that U’ is a subspace of U. 


4. Operations on linear mappings 


It is possible to combine linear mappings in various ways to obtain new linear 
mappings. These operations are of great importance and will be used constantly 
in what follows. 

First of all, if T: U  V and S: V — W are linear mappings of veetor spaces 
over a field K, then their composition $ «Tis a mapping from U to W. It is de- 
fined, ag usual, by the formula 


an SoT(x) = S(T(x)) for any x in U 


It is immediate that $ © T is again a linear mapping (gee Exercise 3 above). Hence, 
composition of linear mappings yields linear mappings. Furthermore, the opera- 
tion is associative, for that is true of any mappings, linear or not. To put it in 
symbols, let L: W — Z be another linear mapping of vector spaces over K. Then 


42 Le (oT) = 


oS)oT 


To prove this it is only necessary to show that the mappings indicated on both 
sides of (4.2) send an arbitrary element of U into the same element of Z, and that 
follows trivially from (4.1). 

We now define two other operations on linear mappings. 


DEFINITION 41 JfT: U — V is a linear mapping of two vector spaces over a field K, 
and if a is any element of K, then aT will denote the mapping of U to V that sends an 
arbitrary element x of U into a+ T(x). Jn symbols, 


43 (aT)(x) =a-Thx) for anyx in U 


IfT’: UV is a second linear mapping, then T + T’ will denote the mapping of 
U to V that sends an arbitrary element x of U into T(x) + T’(x). That is, 


aa (+Tax) =Ta) +1) for anyx in U 


It is a very simple matter to see that aT and T + T’ are again linear mappings 
of U to V. The operations defined above are connected by distributive laws, as 
the following proposition shows. 


PROPOSITION 4.1 Let U, V, and W be vector spaces over a field K. Let T, T’ be linear 
mappings of U to V; and let S, S! be linear mappings of ¥ to W; let a be any scalar. 
Then the following equations hold: 


222 Linear transformations and matrices Ch. 9, Sec. & 


So(T+T) =-SeT +Se% 
S+S)oT=SoT +S oT 
a(t +1’) = aT +c’ 

a(SoT) = (@8)°T =Se (aT) 


as 


Proof, We must show that the mappings standing on either side of any 
of the equations above send an arbitrary element x of U into the same 
thing. By (4.1) the mapping on the left of the first equation sends x into 
S[(P + T’)(x)]. By (4.4) this is equal to S[T(x) + T(x)]. Since $ is linear, 
this is the same as S(T(x)) + S(T’(x)). On the other hand, the mapping 
on the right of the first Eq. (4.5) sends x into So T(x) + So T’(x), by (4.4), 
and this is equal to S(T(x)) + S(T’(x)), by (4.1), The other equations are 
verified in a similar way. 


In the set of all linear mappings U > V we have defined a binary operation + 
in (4.4) and an operation of scalar multiplication, in (4.3). We now have the 
following proposition: 


PRoposttton 4.2 Let U and V be two vector epaces over a field K. Then the set of all 
linear mappings of U to V, with the operations defined in (4.8) and (4.4), is itself a 
vector space over K. It is denoted byt Hom (U, V). 

It is an entirely straightforward matter to verify that the axioms of Sec. 2, Chap. 
8, hold for Hom (U, V), and that is left as an exercise. We mention that the zero 
element of Hom (U, V) is the mapping of U to V that sends every element of U 
into the zero element of V. In general we shall not have much to do with Hom 
(U, V) itself, but the operations in it will occur frequently. For example, if 
Ti. Ts, . . . , Ty are any linear mappings from U to V, and if a1, az, ... , ax are 
any scalars in K, then aT; + - - - + ayT; is an element of Hom (U, V)—that is, 
a linear mapping from U to V. By (4.3) and (4.4) it sends any x in U into the 
vector 


aie) + + + + aeTe(x) 


in. 

There are, however, two particularly important special cases of the foregoing, 
namely Hom (U, U) and Hom (U, K). Let us consider first Hom (U, U). It 
consists of all linear mappings of U to itself. If S and T are two such mappings 
of U, then so are the mappings S + T and S « T defined by (4.4) and (4.1) above. 
Hence sum and product operations are defined in Hom (U, U), which, with these 
operations, is a ring. We omit the verifications here but commend them to the 
reader as un exercise. For example, the distributive axiom for rings [axiom (4), 
Definition 2.1, Chap. 2] holds in Hom (U, U), by (4.5) above. 


{ Hom comes from homomorphism. A homomorphism of two vector spaces is the same 
thing as a linear mapping. 


Operations on linear mappings 228 


But Hom (U, U) is also a vector space over K, by Proposition 4.2. Referring 
now to Sec. 4, Chap. 8, there follows immediately the next theorem: 


THEOREM 4.3 Let U be a vector space over a field K, and let Hom (U, U) denote the 
set of all near mappings of U to ilself. Then that set, equipped with the operations 
defined in (4.1), (4.8), and (4.4), is a K-algebra, 

As pointed out above, the zero element of Hom (U, U) is the mapping that sends 
every element of U into O, The unit element of the ring is the identity mapping I 
of U. If T is any element of Hom (U, U), then, as in any ring, we write T? for 
ToT, T*for ToT oT, and soon, These are all linear mappings of U to itself, and 
therefore so is any linear combination of them with coefficients in K, say 


46 agk + aT + aT? + ~~ +a,T* 


where again I (we can also denote it by T°) is the identity mapping of U. Such 
polynomials are of crucial importance and will occur frequently later on. 

It is to be noted that all the linear transformations al of U (a any scalar) form 
a subring of Hom (U, I) which is isomorphic in an obvious way to the field K. 

Now let us have a look at Hom (U, K). By our definition above this consists of 
all linear mappings of U to the field of scalars K. According to Proposition 4.2 it 
is a vector space over the field K. We shall denote Hom (U, K) by the symbol U*. 
In the case of a finite-dimensional vector space U, the new space U* is called the 
dual vector space of U. 

Tf fis an element of U*, then it is required to satisfy the following conditions: 


f(x) is an element of K for every x in U 
aq f(x +y) =f) + f(y) for any x, yin U 
fax) =a-f(x) for any x in U anda in K 


The last two equations are simply the requirements of Definition 3.1, Instead of 
writing f(x), it is usually more convenient to introduce the notation 


48 f(x) = <f,x> 


which-has the advantage of giving f and x a kind of equal status in the symbolism. 
The quantity (4.8) is called the inner product of fand x. It is easy to verify that 
this imer product is bilinear, meaning that the following equations hold for any 
f, gin U*, for any x, y in U, and for any cin K: 


<ixty> = <fix> + <fhy> 
“a <f+g,x> = <f,x> 4+ <g,x> 
<df,x> = <f,cx> =c- <f,x> 


In other words, the operation symbolized by < , > is linear in its dependence 
upon both of the entries. The first equation in (4.9) ig the same as the first equa- 
tion of (4.7); the second equation in (4.9) is just (4.4) applied to f and g; the third 
equation above combines (4.8) and the second equation of (4.7). 


224 Linear transformations and matrices Ch. 9, See. 4 


The space U* will not play a great part until Chaps. 14 and 16, but we shall 
come across it from time to time. 


THeonem 44 Let T: U — V be a linear mapping of two vector spaces over a field K. 
Then there is a unique linear mapping T*: V* —> U* such that <T*(g), x> = 
<g, T(x) > for any g in V* and any xin U. In fact, T*(g) = goT. 
Proof. Any g in V* is by definition a linear mapping V + K, There- 
fore the composition go T is a linear mapping of U to K, hence is an ele- 
ment of U*. Therefore the operation g — g oT is a mapping of V* to U*, 
which we denote by T*. Then <T*(g),x> = <geT, x>, and by (4.8) 
this is go T(x), which by (4.1) is the same as g(T(x)). From (4.8) again, 
this is equal to <g, T(x) >. Therefore T* satisfies the equations appearing 
in the theorem. T* is linear. Let h be another element of V*. We must 
show that T*(g +h) = T*(g) + T*(h). By definition of T*, the left mem- 
ber here is the same as (g +h) T, which, by (4.5), is equal to g oT + 
heT, and this is in turn equal to T*(g) + T*(h), by definition of Tt. 
Furthermore we must show that T*(eg) = ¢-T*(g) for any g in V¥ and 
any scalar c. But this equation is the same as (¢g) - T = c(g- 7), by defini- 
tion of T*, and the equation is therefore true, by (4.5). Finally, T* is 
unique: Let Ti be another linear mapping from V* to U* such that 
<Ti(g), <> = <g, T(x)> for all g,x. Putting S = T* — Ti, we have 
<S(g), x> = <T*(g) — TT(g), x>, by definition of F* — Tt, that is, by 
(4.3) and (4.4). From (4.9) the last expression is equal to <T*(g), x> - 
<Ti(g),x>, and by assumption this is <g, T(x) > — <g, T(x) >. Hence, 
<S(g), «> = G forall gand x, Then S(g) is the zero mapping of U to K, 
for any g. That is, S(g) is the zero element of U*. Therefore S maps all 
of V* into the zero of U*, and so S = 0,orT* = Th. QED. 


DEFINITION 4.2 The mapping T* of Theorem 4.4 is called the transpose of T. 


EXERCISES 

1. Let T be the endomorphism of Q; defined in Exercise 4, Sec. 3. Where do 
the two mappings I + 4T? and 21 — 8T + 5T? — T° send the veetors (1, 0, 2) and 
(2, -1, 1)? Determine the rank and nullity of the first of the two mappings. 

2. Let Do be the endormorphism of C{i], defined in Exercise 2, Sec. 3. What is 
the mapping Do"? Find bases for the kernel and image of the operator 21 — 
Do + 3Do". 

3, Let W be a K-algebra, where K isa field. Let T be a linear mapping of W 
to itself, W being considered as a vector space over K (we do not assume that T is 
compatible with the product operation defined in W). Let u be an element of W, 
and define a new mapping uT of W to itself by the rule that uT sends an arbitrary 
element x of W into u- T(x). Show that uT is again a linear mapping of W [this 
operation generalizes (4.3)). 


Linear transformations and matrices 225 


4, Let D be the derivative mapping of C[t] defined in Exercise 2, Sec. 3. If f(#) 
is any polynomial in C{f, then f()D, as defined in the preceding exercise, is a 
new linear mapping of C[f} to itself, that is, it maps an arbitrary polynomial g(£) 
into the polynomial f(#) D{g(é)]._ A special instance of this is the operator (1 — &)D. 
The composition of D and (1 — @)D is then the operator D(1 — )D. It is of con- 
siderable importance in some applications, and we shall encounter it in later exer- 
cises. Where doesit send the polynomials2 —¢+@andl —#*+—-#@+H%—-8? 
Show that the operator is the same as (1 — #)D® — 2D. 

5. Let S$ and T be two isomorphisms of a vector space U. Show that (S* T)-! = 
Test, 

6. Let T: UV and 8: V — W be linear mappings of vector spaces, and let 
T* resp, S* be their transposes. Prove that (So T)* = T* oS*. 

7. Let U and V be two vector spaces over a field K. Show that the operation 
that sends a linear mapping T of U to V into its transpose T* is a linear mapping 
from Hom (U, V) to Hom (V*, U*). Show that if T is an epimorphism, then T* 
is a monomorphism. 

8. U being a vector space, let GL(U) denote the set of all automorphisms of U. 
Prove that this set, with composition of mappings as binary operation, is a group 
(called the general linear group on U). 


5. Linear transformations and matrices 


We shall now show how to transcribe calculations with linear mappings into calcu- 
lations with scalars, This is a very important step for many purposes, and it will 
lend an appearance of concreteness to some of the things we have been discussing. 
It is to be remarked, however, that many arguments involving linear mappings are 
most easily effected without the use of matrices, 


PROPOSITION 5.1 Let U and V be vector spaces over a field K, and suppose that U 
has finite dimension n, Lei m, ... ,U, be a base for U, and let yy, .- . , yn be 
arbilrary vectors in V. Then there is one and only one linear mapping T: U > V 
such that 


Tan) = yi, Te) = yn Tk) = Yo 
and every linear transformation U — V is determined in this way by @ suitable choice 
FI Yn 
Proof. Since uw, ... ,U, form a base for U, every x in U can be ex- 


pressed uniquely as a linear combination 


5a xaoy tees $a, 


with z',...,a*in K, Nowy... ,¥, being given in V, we define T 
by the formula 


52 Tx) = eit ++. +2, 


226 


Linear transformations and matrices Ch. 9, See. 5 


Since there is only one way to write x in the form (5.1), it is clear that T(x) 
as defined in this manner is uniquely determined by x. It is a trivial matter 
to verify that T is indeed a linear mapping. If now we take uy, say, for x, 
then (5.1) becomes 


w=lem+O-m+--- +0-0, 


and so (5.2) gives T(u)) = y; Similarly, T(w,) = y;forj =1,...,m. 

On the other hand, if T; U — V is any given linear mapping, and if we 
define y; by Ttu,) = y; G = 1,.. . , »), then from (3.3) we get Tix) = 
vy + +++ 4-2"y,, which is the same as (5.2). This shows that our defi- 
nition of T(x) by (5.2) is the only one possible. @.E.D. 


The proposition says simply that a linear transformation U — V is completely 
determined by its effect on a set of base vectors in U and that effect may be pre- 
scribed arbitrarily. Asa simple application we have the following corollary: 


cornoutary Let U be @ vector space of finite dimension n over a field K. Then the 
dual vector space U* also has dimension n. 


5.3 


Proof. Letw,... ,u, bea base for U, and for each j = 1, . 
let u/ denote the linear mapping of U to K determined by the following 
conditions: 


<w,uy> = 


{o it Gal... sn) 


0 ifj #7 
‘We recall that the left. member of this equation stands for wi(u;). Accord- 
ing to Proposition 5.1, for given j there is one and only one linear mapping 
U — K for which (5.8) holds, Now let f be any element of U*, and set 
<fuy> =f; (i =1,...,), Each u’ is an element of U*, and there- 
fore, by the general definitions given in Sec. 4, so is fu" + fu? +--+ + 
f,u", Furthermore, this mapping sends u, into f,, by (5.3). It therefore 
has precisely the same effect as f on the given base of U and consequently 
must coincide with f, by the proposition above. Hence, using the Einstein 
summation convention, we have 


f=f/w 


The elements u', . . . , u* therefore span the space U*. From (5.3) it is 
very easily seen that they are linearly independent and consequently form 
a base for U*, which accordingly has dimension ». If x = 2'u; is an arbi- 
trary element of U, then using (4.9) we get 


<fx> = <fw, c'uy> = fjz' <w,u> 


and by (5.3) this reduces to 


Linear transformations and matrices 227 
54 <f,x> =fa' 
the summation convention being used throughout. 


Remark. In Sec. 2 it was mentioned that it is sometimes desirable to reverse the 
index convention. It is precisely in connection with the dual vector space that 
this situation arises. In the original space U we used lower indices on base vectors 
and upper indices on components. A brief examination of the foregoing shows 
that it is entirely natural to use upper indices on the base w', . , . , u" that we 
found for U* and lower indices for the components /,, . . . , f, of f relative to that 
base. The base defined by (5.3) for U* is said to be dual to the given base in U. 


Now let U and V be vector spaces over a field K of dimensions n and m, respec- 
tively, LetT: U —V bea linear mapping. Ifu, .. . , u, is a base for U, and 
liv, ..., ¥ais a base for V, then each element T(u;) can be expressed uniquely 
as a linear combination of the base elements in V, say 


55 Pu) = Gon + Baye tot + Om = By @=1L...." 


where on the right we have used the summation convention. The coefficients #/, 
are elements of the field K, and there are mn of them. It is very convenient to 
think of them as arranged in a rectangular array 


Such an array is called an m X x matriz, and the entries t’; are called its coefficients. 
Observe that the upper index j indicates the row containing #,, and the lower index 
tells which column it is in. We shall sometimes abbreviate the matrix (5.6) by 
the symbol (t/;) or by the boldface letter t. (Naturally for different matrices we 
shall use different letters; the particular choice of indices is a matter of indifference, 
of course.) 


DEFINITION 51 The matrix (5.6) 1s called the matrix of T relative to the base pair 
{ui}, (vi) in U and V. 

From the matrix t of the mapping T it is easy to caleulate the components of 
T(x) for any x in U. To do so, let x = 2m, +m +--+ +m, = ru. By 
(8.8) we have then 

T(x) = Tru) = 2. Tea) 
Substituting (5.5) in the last expression, we get 


Tee) = 2tiy; 


228 Linear transformations and matrices Ch. 9, See. 5 
The coefficient of v; here is x‘; = 2'# + » +--+ + 2°t’,, which we prefer to write 


as fic%. 


THEOREM 5.2 Ift = (1/,) is the matrix of the linear mapping T: U — V relative to 
the base pair {uy}, {vj}, and if zt, . . . , 2" are the components of an element x of U 
relative to {us}, then the components of T(x) relative to {v;} are 


a7 vet 


exampte Let U, V be vector spaces of dimension 2 resp. 3 over the rational 
field Q. Let uy, uz and v:, v2, v3 be bases for them, and let T be the linear mapping 
of U to V for which 


T(u1) = 2v, + 4v2 — vs 
Tur) = v1 — 4¥s 


Then the corresponding matrix of T is the 3 X 2 matrix 


2 1 
(i) 
-1 -4 
(It is important not to mix up the rows and columns!) For the vector x = Su; + 
2us in U, the components of T(x) are, by (5.7), iz’, Gat, Bai, or 265 41-2, 
4-540-2,-1-5 + (—4)-2 ThusT(x) = 12; + 20v; — 13¥;. Thisis easily 
verified directly. 

Equation (5.5) tells us how to calculate the effect of T on the base elements 
uy, ... ,U, from the matrix (5.6), and therefore T is completely determined by 
the matrix once the bases in U and V are given, by Proposition 5.1. A different 
choice of bases in U and V results in a different matrix (5.6) for T. We shall soon 
see how different matrices for the same linear mapping are related. 


REMARK. Any m X n matrix (¢,) with coefficients in a field K is the matrix of 
some linear transformation. For we have only to take any two spaces of the proper 
dimension, for example, K, and K,, slong with bases for each, say w,... , 
u, and vi, . . . , Ym, Fespectively. If we then define T(u,) by Eq. (5.5), we shall 
automatically determine a linear transformation T of the two spaces, by Propo- 
sition 5.1; and clearly (¢?,) will be its matrix with respect to the base pair chosen. 


Matrices of particular importance are those for which either m or # equals 1. 
A 1X » matrix consists of a single row of x scalars and therefore has the form 


a= (ay az, . . . an) 


Matrices of this type ure simply vectors in the space K,, of n-tuples. We shall call 
them row vectors, 


Linear transformations and matrices 229 


On the other hand, an m X 1 matrix consists of a single column of m scalars and 
has the form 


Matrices of this type will be called column vectors. All such column vectors with 
coefficients in K constitute a set which, apart from the notation, is the same as K,,. 
We shall call it K™ in order to indicate the vertical arrangement. 

It is then natura] to denote by K", the set of all m X n matrices with coefficients 
in the field K. Further, we denote by t’ the jth row of (5.6). It is an element of 
K,,. And we denote by t; the ith column of (5.6). It is an element of K". Thus, 
if we want to consider the matrix (5.6) in terms of its columns, we can write it as 


cy t= (tty... te) 


If we want to express (5.6) in terms of its rows, we can write it as 


To summarize, the general notationat rules that we shall adopt are as follows: 


vj; denotes the element in the jth row and ith column of the matrix t~or 
briefly, the (J, 2) element of t. 
t denotes the jth row of the matrix t. 


t; denotes the ith column of t. 
K, denotes the set of all m x n matrices with coefficients in the field K, 


except that we write K™ for K™, and K, for K',. 


5.40 


Of course 1 X 1 matrices are just scalars, and boldface characters will not be 
used for them, 

Observe that if T = 0, that is, if T maps every element of U into the zero element 
of V, then all the coefficients ¢’; in (5.6) must be zero. In this case t is said to he 
the m X n zero matriz, denoted simply by 0. 

In Definition 5.1 two bases are involved, one for U and one for V. If the two 
spaces happen to be the same, then of course there is no necessity for two different 
bases. If the two bases are taken to be the same, say {ur}, {us}, we shall simply 


230 Linear transformations and matrices Ch. 9, See. 5 


call t the matrix of T relative to the base {u;}. Observe that in the case of an 
endomorphism, the matrix t will be x x ». Such a matrix is called square. 

Let us now determine the matrix of the identity mapping I of the space U. For 
the given base {u,] we have 


sal Ta) =u; =O0-u + O0-u+ +++ tlewte es +O0-m, 


and from the general definition above it follows that the matrix of I relative to 
the base {u,} is 
10 0---0 
0 10--. 
00 0--01 
where naturally 1 denotes the unit element of the field of scalars K. I, is called the 
n X n unit matrix. We shall sometimes denote it simply by I. The matrices of the 
zero and the identity mappings ) —- U do not depend upon the particular choice 
of base in U. But for all other endomorphisms of U the matrix depends in an 
essential way on the choice of base. 

It is often useful to have a symbol for the coefficients of the n x » unit matrix, 
and we shall indicate the element in the jth row and ith column of (5.12) by the 
symbol 8%,. 

jh it a 
"70 if ims 


Est 3 
The symbol is called the Kronecker delta. 


EXERCISES 
1. Let U, V, W be vector spaces of dimensions 2, 3, 4, respectively, over Q. 
Let {u1, uct, {¥1, va ¥s}, and { wi, wa, Wa, Wa} be bases for them. Let T: U -V 
and S: V — W be the mappings defined by 
T(u:) = 8¥1 — 2v2 + ¥3 
Tus) = 4¥, + v2 
S(vi) = wi + we + wa — We 
S(v2) = 2w — we + ws 
S(vs) = 2w, — wa + 44 
Write down the corresponding matrices of T, S, and S » T. 
2. With S, T as above, find the components of S T(x) relative to {w.}, where 
x = Su, — oe. 
3. Find bases for Ker T, Ker 8, and Ker (Se T). What is the rank of S oT? 
4, With T as in Exercise 1, write down the matrix of its transpose T* relative to 
the dual base pair {v’} and fu‘]. Do the same for S*. 
5. Referring to Exercise 2, Sec. 4, find a base for C[é], and give the matrix of Da 
relative to that base. Give the matrix of 21 — Dp + 3D,° relative to that base. 


Operations on matrices 231 


6. Operations on matrices 


We shall now see how the operations on linear transformations discussed in See. 4 
can be expressed in terms of matrices. We start with the operation (4.1) of com- 
position. 

Let U, V, W be vector spaces of dimensions », m, r, respectively, over a field K; 
and let T: U + V and S: V > W be linear mappings. Choose bases {u,}, {y,), 
and {w,} for the three spaces. Let t be the matrix of T relative to the base pair 
tui}, {v,}; let s be the matrix of S relative to base pair {v,}, |w.}; and finally let q 
be matrix of S © T relative to the base pair {u.}, {w.}. Then, according to Defini- 
tion 5.1, we have 


Tu) = fy; = th te + ie 
e1 Sv) = stjw, = stywi te) tai, 
SeoT@) = Gwe = Gumi te +g, 


Apply S to the first of these equations. Using (3.3) we get 
8 oT) = S@v) = USO) 

Substituting from the second equation we gett 

62 So Tu) = tistjwe 


The coefficient of w* here is the sum #/,s';, which we prefer to write as s*,¢’; (legiti- 
mate because multiplication in K is commutative). Comparing (6.2) with the last 
equation of (6.1) we conclude (Proposition 6.1, Chap. 8) that 


+t hts 


63 gs = shit = shhh + aty 
We take this equation as the basis of the following definition: 


DEFINITION G1 Let s be an t x m matrix and let t be an m Xn matrix, both with 
coefficients in a field K. Then the r X n mairiz q whose coeficients are given by (6.3) 
is called the produet st of s and t. 

Hence, with this definition we can state the following: If s is the matrix of 
S: V > W with respect to the base pair {v,}, | wi}, and if t is the matrix of T: 
U — V with respect to the base pair {u,j, {v,}, then the product matrix st is the 
matrix of § ¢ T relative to the base pair {us}, { wil. 

Observe that the element q', in the kth row and ith column of st is obtained from 
the kth row of s and the ith column of t. A matrix product is defined only when the 
number of columns in the first factor is equal to the number of rows in the second factor. 


EXAMPLE 


3 tf 2/1 « 7 14 
QO 1 if2 2)=73 2 
1 2 O/\I 9, 5 8, 


In this case the factors on the left cannot be interchanged. 
+ We chose different indices i, j, & in (6.1) to avoid conflict in (6.2). 


232 Linear transformations and matrices Ch. 9, Sec. 6 


In the following example the factors on the left can be interchanged. We see 
that matrix multiplication is not commutative, in general, even when it makes sense 
to reverse the order of the factors: 


(2a dG 2) 
(202 )G 2) 


However, matrix multiplication is associative whenever it is defined. More 
precisely, let a, b, c be matrices of sizes m X 2, 2 X 7, and r X 8, respectively. 
Then, by Definition 6.1, ab is an m Xr matrix, and so (ab)e is defined and is 
m Xs. Similarly, be is an » X s matrix, and so a(be) is defined and is m X s. 
We have 


64 a(be) = (abje 


This follows easily from (4.2), of which it is the matricial form, but it ean be proved 
directly without trouble from Definition 6.1. 
We have further, for any m X 1 matrix a, 


65 L,-a=a-l,=a 


where I, and I, are unit matrices (5.12). For example, the (x, i)-element of I,a is 

dial, = hats + beat +--+ dheak, + os + nas 
by (6.3), and this sum is equal to a*,, by (5.13). Equation (6.5) has the following 
meaning: Let A: U — V be a linear mapping of vector spaces, where dim U = x 
and dim V = m. Let a be the matrix of A with respect to some base pair {u,} {¥,}- 
If I denotes the identity mapping of V, thenI > A = A, of course. The matrix of I 
relative to the base {v,} is I,., and so the matrix of I A relative to the base pair 
{u,}, {v;} is L, + a, which must be the same asa. Similarly, if I’ denotes the identity 
mapping of U, then Aol’ = A. The matrix of A oI relative to the base pair 
{uit}, {v,} is a-i,, which must be equal to a. 

Let us now consider matrix products involving row vectors and column vectors. 
It follows readily from Definition 6.1 that the product of two row vectors, or of two 
column vectors, is not defined unless at least one of them is a scalar (1 x 1 matrix). 
Let us then consider the product of a row vector x = (1... ,%m) in Ky and ofa 
column vector 


y 


y 
t With coefficients in the same field, naturally. 


Operations on matrices 233 


in K", The product yx is always defined and is the x x m matrix 


WP ss YS, 


But the product xy is not defined unless m = x, in which case it is a 1 < 1 matrix, 
that is, the sealar 


66 xy = cyl = ry! tes + ony” 


With x in K,, and y in K®, as above, let a be an element of K™,, that is, an m X n 
matrix with coefficients in K. Then ay is defined and is m x 1, hence is again a 
column vector. Similarly, xa is defined and is 1 x », a row vector. 

If b is an element of K",, then ¢ = ab is defined and is in K",. With the fore- 
going remarks in view it is easy to verify the following rules, which summarize the 
relations between a, b, and e: 


ab 
ab; 


atbi, = ct, 


c ab 


67 


a®b; = cf; 


The first two equations are merely the definition of the product ab. The third 
equation says that the ith column of ¢ is equal to a times the ith column of b, 
Similarly, the fourth equation states that the kth row of ¢ is the product of the 
kth row of a with b. The fifth equation is (6.6) applied to a* and bi. 

We point out that (5.7) can be expressed as a matrix product. For let x be an 


element of the n-dimensional space U, x having components x', x", . . . , x" relative 
to a base {u;},0 thatx = 2‘u; Because of our index convention we now consider 
the a-tuple formed of x', #7, . . . , * ag an element of K*—a column vector, To 


prevent confusion with x we denote that column vector by x., 


and we call it the component vector of x (relative to the given base). This being 
established, we have the following proposition: 


proposition e1 Let T: U —V be a linear mapping of finite-dimenstonal vector 
spaces, and let t be its matrix relative to some base pair ful, {vj}. Ifx in U has com- 
ponent vector x, relative to {u,}, then the component vector of T(x) relative to tv,} is the 
matrix product xe. 

This follows immediately from (5.7) and the definition of matrix multiplication. 
The proposition below is of a similar nature. 


B34 Linear transformations and matrices Ch. 9, Sec. 6 


proposition @2 K being a field, let abe an element of K™,, that is, an m X n matrix 
with coefficients in K. Then the operation that assigns to any vector x in K* the new 
vector ax is a linear mapping A from K" to K", Purthermore, a is the matrix of A 
relative {0 the canonical bases, + Similarly, the operation that sends any vector y in Kn 
into ya isa linear mapping from Kn to K,, and a ig its matrix relative to the canonical 
base pair, If B: K* > K* is any linear mapping, and if its matrix retative to the 
canonical bases is denoted by b, then B(x) = bx for any x in K*. If B's Kn > Ky is 
a linear mapping, with matrix b relative to the canonical bases, then B’(y) = yb for 
any y in Kn. 

The verifications involved here are very simple and straightforward and are 
therefore omitted. 


We observe that if a and b are two square matrices of the same size, say 2 X 2, 
then ab is also n X x. Hence, matrix multiplication is a binary operation in K",, 
the set of all x x m matrices with coefficients in K. The operation is associative, 
by (6.4). In particular, we can form a? = a-a,a? = a-a-a,ete., forany ain K", 
(we are using See. 7E, Chap. 2, here). 

We now introduce two other very simple operations on matrices: If a = (@:) 
is any element of K™,, and if p is any element of K, then we define pa by 


69 pa = (pa’;) 


That is, pa is the m X n matrix whose (j, 7) element is pa’,. If b = (6/;) is another 
matrix in K™,, then we define a + b by 


610 at b = (wi + BD 


Thus a= b has a’; + 5’, as its (7, 7) element. These definitions are obviously 
straightforward extensions of the operations defined in K,, in Sec. 4, Chap. 8. In 
fact they are the same definitions if one imagines the elements of an m X n matrix 
written out in a single row. It follows at once that K™,, with the operations (6.9) 
and (6.10), is a vector space of dimension mn over K. 

These operations and matrix multiplication satisfy the following relations, when- 
ever the indicated operations are defined (p denotes an arbitrary scalar; a, b, ¢ 
denote arbitrary matrices with coefficients in K): 


a(be) = (abe 

{a + bye = ae + be 
6.41 e(a +b) = ca + ch 

p(a + b) = pa + pb 

plab) = (pa)b = a(pb) 
These rules of calculation are the matricial counterparts of (4.2) and (4.5). We 
have repeated (6.4). All the equations follow trivially from the definitions of the 


+ The canonical base of K» consists of the x columns e, és... , é. of the unit matrix I, 
The canonical base of K, consists of the 2 rows e! e,..., e" of I,. Cf. Remark 3, 
See. 9, Chap. 8, 


Operations on matrices 235 


operations and from the field axioms for K. It is quickly seen that K”, is a K- 
algebra. 


proposition 6.3 Lei U, V be vector spaces over a field K of dimensions », m, re- 
spectively. Let {ui} and {v,} be bases for them. Then the operation that associates 
with any linear mapping T:; U — V its matrix t relative to the given base pair is an 
isomorphism of the vector space Hom (U, V) to K™,. 

This follows immediately from Definitions 4.1 and 6.1 and from (6.9) and (6.10). 
There is no operation of “product” defined for elements of Hom (U, V), in general, 
but there is one in Hom (U, U). We have the following theorem: 


THeorem 6.4 Let U be a veclor space of dimension n over K, and let {u,} be a base 
for U. Then the operation that assigns to a linear mapping T: U — U its matriz t 
relative to the base {u,} is an isomorphism from the K-algebra Hom (U, U) to the 
K-algebra K*,. 

An isomorphism of K-algebras is a one-to-one mapping which is compatible 
with addition, multiplication, and sealar multiplication, That the mapping in 
question is compatible with addition and scalar multiplication is stated in Prop- 
osition 6.8. Its compatibility with the product follows from Definition 6.1 (see the 
remark following it). 

These two theorems show that linear mappings of finite-dimensional vector 
spaces can be replaced by matrices. For purposes of calculation this is often 
essential. 

We now introduce one more operation on matrices. It is a very trivial one, but 
it is often useful. . 

Let a = (a‘;) be an element of K”,. Then the transpose of a, denoted by ‘a, is 
the matrix (‘a?;) in K*,, such that 


That is, the (j, 7) element of ‘a is the same as the (7, j) element of a. 


Example The transpose of 


2 1 
34) is Gi3) 
1 2. 


The transpose operation merely interchanges rows and columns of a matrix, It 
is clear that ‘('a) = a. That is, the transpose operation applied twice gives back 
the original matrix. The important properties of the transpose operations are 


as follows: 

a) =a 
ce (Rat 
: "(pa) = p-'a 


‘(ab) = ‘ba 


236 Linear transformations and matrices Ch. 9, See. 6 


The first three rules are completely trivial. The second and third allow us to 
conclude that transposition is an isomorphism from K™, to K",,. To prove the last 
equation we have: 


IG, 2) element of ‘(ab)| = [(, %) element of (ab)] = aid = bia; = 
‘BS fa’; = (Ck, 2) element of ‘bfal 

The equation is easily checked by numerical examples. 

We mention here some terminology which is often used in connection with 
matrices. 

Let a = (a’;) be any m X ” matrix. Then the coefficients a1, a%, a%, etc., 
are called the diagonal coefficients of a (the term is usually restricted to square 
matrices). The matrix a is called a diagonal matrix if all terms not on the diagonal 
are zero. For example, I, is a diagonal matrix. 

The matrix a is called triangular if all the entries below the diagonal are zero, 
that is, a’; = 0 if i > j. For example, 


213 4 
0112 
00 38 1 


is triangular. The term is usually restricted to square matrices. 
Asquare matrix a is called symmetric if 'a = a, that is, if a/; = a’; for all relevant 
dand j. 


EXERCISES 
1. Work out the products ab and ba, where 
; 14 2 0 
a=(, y 1 b=[ 55 06 
-1 2 -1 38 
1-1 4. 


2. Compute ba. What 


2. Find a matrix b such that ab = Is, where a = ( 4 


happens if you take a = (2 “), 


3, Compute ‘b‘a, where a, b are the matrices of Exercise 1. 
4. Let a be an element of K”,, and let x, y be elements of K” and K”, respectively. 
Show that the equations 


ax=y 

wapte-++) +o" sy 
and 

ahr! + 


art pee faye sy" 
are all equivalent. 


Operations on matrices 237 


5. Referring to Exercise 4, write out the equation ‘x'a = ‘y, exhibiting the 
coefficients of the three matrices. 

6. Let a be an element of K",, and suppose that its columns are linearly inde- 
pendent vectors. Prove that there exists a unique vector b, such that ab; = e;, 
where e; is the jth column of I,. Compute ab, where b is the x X » matrix whose 
columns are by, . . . , ba. 

7. Compute the following matrix product, where ¢ is an indeterminate over Q. 


. 1 
fg 1-5 1! 
24+ 1 at e 
eof 0 


8. Let a, b be triangular matrices in K",. Prove that ab is triangular. 

*9, Let a be an element of K",, and suppose that ab = ba for every matrix b 
in K”,. Prove that a = a- I, where ais ascalar. Matrices of this type are called 
scalar matrices. Prove that the set of all scalar matrices in K", is a subring which 
is isomorphic to K, 

10. Prove that A”, is a vector space of dimension mn over K. 
11. Let a be a matrix in K*, with a/; = Ofori > j. Prove that a" = 0. [Hint: 
Experiment with a 3 x 3 matrix.] 

*12, Let a = (a';) bean <n matrix. Show that the following two conditions 
on a are equivalent: 

(a) The matrix a represents a linear mapping A: K,— KX, such that the matrix 
of A with respect to some base has form 


bc 
@ 3) 
where b, d are square matrices. 
(6) The matrix a represents a linear mapping A: K, — K, which possesses an 
invariant subspace Z of dimension <x, that is, A(L) C L. A matrix satisfying any 
of these conditions is called reducible; otherwise, it is irreducible. 


*13, Ifa = (a’;) is an irreducible real 2 X n matrix and a’; > 0; then every entry 
of the matrix 


(+a) 


is positive. 
(Hint: Prove that if uis a nonzero vector whose components are non-negative, then 
the vector (I + a)u has fewer zero components than u.j 

14, An important aspect of recent mathematical economies is the study of the 


238 Linear transformations and matrices Ch, 9, See. 7 


“input-output” model of an economy.t The viewpoint implicit in this model is 
that an economy is a system where many different types of goods and services 
(called commodities) are produced and then used up in the production of still more 
commodities. Let Ci, Cx, ..., C, be the commodities produced in an economy 
and suppose that 7‘; is the amount of C; consumed to produce a unit of C, and that 
yg’; is the amount of C; used (but not necessarily used up) in the production of a 
unit of Cj. Thus rua 0, 2rd = 0, but ¢"heead #0. Assuming that 
the price of a commodity is the sum of the price of the commodities used up in its 
production plus p (the “rate of profit,” assumed constant for all commodities) times 
the price of the commodities used, write a matrix equation relating 1',, yg’; and the 
prices of Cy, Cx, ..., C,. The study of the required equation is an interesting 
problem, for the solution of which we refer to Schwartz’s book. Let us merely men- 
tion here that a positive rate of profit » exists if and only if there is a program of 
production in the economy which yields a surplus, i.e., for which the amount of 
every commodity produced is greater than or equal to the amount consumed. 


7. Change of base 


In Sec. 9, Chap. 8, we have already considered briefly the matter of changing bases 
in vector spaces. The calculations involved can be expressed very compactly by 
the use of matrices. 

Let U be an n-dimensional vector space over a field K, and let uw, . .. , a, and 
ui, ..-, us be two bases for U, Then any element in one of the bases can be 
expressed uniquely as a linear combination of elements in the other base. Hence, 
the two bases will be related by equations of the following type (ef. Eq. (9.5), 
Chap. 8]: 


uy = phy = pla + ++ + pt, 


WA 
uy = gk = gta te +h 


We have here two n X » matrices, p = (pi;) and q = (¢). 
Now let x be an element of U, and let x., x¢ be its component vectors relative to 
the two bases. That is, 
om 
X= . and x = xtur 
+ 


} See, for example, J. T. Schwartz, ‘Lectures on the Mathematical Method in Analytical 
Economics,” from which we borrow the above considerations. 


Change of base 239 


Similarly, 


In the last equation substitute u! = p/u; from (7.1). There results x = <"piu;. 
The coefficient of u, here is then px‘, and that must be the same as x’. Thus 
x! = pia’! Similarly, substituting uj = g'wwl into x = xu; we get x% = gtjx/. 
From Definition 6.1 it follows that the two equations just derived can be expressed 
as matrix products. We have 


7200 xe = psi and kL = ak, 


ICh. Eq. (9.6), Chap. 8} 

The two matrices p and q of (7.1) are very intimately related. For ease of 
reference we shall call p the matrix from the base {u;] to the base {ui}, and q the 
matrix from {wi} to {u;|. We shall show that pq = gp = 1, (x Xn unit matrix). 
In fact, substitute the second equation of (7.1) into the first, getting ui = piigt a, 
Equating coefficients on both sides (Proposition 6.1, Chap. 8), we see that pigty = 
¢fsp%: oust be zero if k » iand Lifk = i. Thus 

jl dk =i 
Opi = = to itk ei 
Therefore, by (5.12), (8.18) and the definition of matrix multiplication, we have 
qp = 1,. In like manner we can substitute the first equation of (7.1) into the 
second. To do ao we must firat change i to &; and 7 in the first equation must be 
changed to some other letter, say #, in order to avoid conflict with the j in the 
second equation. We get u; = ¢'p'an, from which we conclude that pid; = 5, 
just as ahove. By Definition 6.1, this is precisely the matrix equation pq = In. 

Now I, is the unit element of the ring K*, of all n X m matrices with coefficients 
in K. From our general definition of inverses (Definition 4.2, Chap. 1) it follows 
that the matrices p, q above are inverses of each other (.e., with respect to the 
product operation). We therefore write p = q-! or q =p’. 

We observe that p is none other than the matrix, with respect to the base {u/] 
of the linear transformation P: U — U that maps u; into v;for i= 1,...,m. 
Piis clearly an isomorphism, ‘Thus any change of base in U ean be described by an 
isomorphism from U to itself. 


Derinition 7.1 A matrix in K", iz called nonsingular if it has an inverse; otherwise 
it is said to be singular. 

The following proposition shows that any nonsingular matrix can be used to de- 
fine a change of base. 


240 Linear transformations and matrices Ch. 9, See. 7 


PROPOSITION 7.1 Lei U be an n-dimensional vector space over K, and let, ... , 
u, b¢ a base for U. If p = (pi) is any nonsingular matrix in K",, then the n elements 
a! = piu; @ = 1, ... , 2) forma base for U. 
Proof. If the uj were linearly dependent, then there would be a non- 
zero column vector a = (a')in K" such that a'uj = 0. Thatis, ‘pu; = 0. 
From this we conclude that p/ja' = 0 forj =1,... , n, because of the 
linear independence of u,..., ux By Definition 6.1 we have then 
pa = 0. By assumption, p has an inverse matrix p-!, Multiplying the 
last equation by it we get p~'(pa) = 0, or (p~'p)a = 0, or Ia = 0, whence 
a = 0 by (6.5), a contradiction. @E.D. 


We can now calculate very easily the effect of a change of base on the matrix of 
a linear transformation. 


THEOREM 7.2 Let T: U — V be a linear mapping of an n-dimensional vector space 
over K to an m-dimensional vector space over K. Let {us} and {us} be fwo bases for U, 
and let p be the matrix from the former to the latter. Let |v;} and {vi} be two bases 
for V, and let q be the matrix from {vj} to {vj}. Pinally, let t be the matrix of T rela- 
tive to the base pair {uj}, {vj}; and let U be its matrix relative (0 the base pair {ui}, 
{vi}. Then 


3 t =q-ip 


Proof. By assumption we have uf = Therefore T(u!) = 
Tpiu;) = pT). By definition of ¢, Tia;) = tvs, and so we have 
Tia!) = pitt;v. By definition of t’, Tia!) = t’.vi; and vi = gfavg, by-defi- 
nition of q. Hence, T(u!) = tv. Comparing the two expressions for 
Tat) we get piltive = U*igkave. The coefficients of v; on both sides must 
be the same, and sot ##,p?; = g't’",. From the definition of matrix multi- 
plication this says that tp = qt’, from which (7.3) follows by multiplying 
both sides by q7! on the left. QED. 


In the special case of an endomorphism T: U — U, if t is its matrix relative to 
{u;} and if ¢’ is its matrix relative to {ui}, then (7.3) reduces to 


or t= p'tp 


The following theorem shows that. it is always possible to choose bases in U and 
V such that the matrix of a linear mapping T: U — V ig a diagonal matrix as 
defined at the end of See. 6. 


tHEoRem 7.3 Let T: U —+ V be a linear mapping of finite-dimensional vector spaces 
over a field K, and le T hate rank r. Then there exist bases {uj} and v,} of U and V 
for whieh the matrix t of T has the following form: 


+ As on several earlier occasions we have reversed the order of the factors so that they will 
appear in the same order as in the matrix equations that follow. 


Change of base 24t 


t= fori=1,... 1 (no summation on i) 


ws 
v;=0 otherwise 


Conversely, if the matrix t has this form, then the mapping T has rank r. 


Proof. Let x = dim U. Then dim Ker T =n — +, by Theorem 3.2. 
Let u,41, Urs, , ty be a base for Ker T. By Theorem 9.3, Chap. 8, 
this set can be enlarged to a base for U by adding on r new vectors w, 
un... 48, Set vy = Tu) forj =1,..., 7. Just as in the proof of 
Theorem 8.2, the vectors 1... , v; form a base for Im T. By Theo- 
rem 9.3, Chap. 8, again, this set can be enlarged to a base for V by adding 
on certain number of new elements ¥,,1, ¥,42, ete. The matrix of T relative 
to the base pair fu,}, {v;| has the form stated in (7.5). The last part of 
the theorem is trivial. @.E.D. 


Now a linear mapping T is usually specified by giving its matrix t relative to 
some base pair. In order to get its matrix in the form (7.5), it is necessary to 
change the bases in U and V. That is, it is necessary to choose p and qin (7.3) in 
such a way that t’ has the form (7.5). In Sec. 9 we shall see how to do that effec- 
tively. A somewhat different problem arises in the case of endomorphisms, for 
then it is usual to define the matrix of a mapping in terms of a single base {u;}, 
rather than a pair of independent bases {u;}, {v,}. Then there is only one base 
to change in order to simplify the matrix. That is, we have only p at our disposal 
in (7.4), rather than both p and q in (7.8). It is to be expected that the problem 
of simplifying the matrix will be more difficult in this situation because of the de- 
creased freedom. The problem is a very important one, and it will be discussed 
in detail in Chap. 13. 

As a complement to Theorem 7.3 we have the following theorem: 


THeorem 7.4 If the matrix ¢ of a linear mapping T: U — V has the form (7.5) 
relative to a base pair {ux}, {uj}, then U1, Unser. . Ua forme a base for Ker T and 
Tu), .. ., T(e,) form a base for Im T. 

The proof is very straightforward and is similar to that of Theorem 8.2. 


EXERCISES 
1, Let fu, uz, us} and {v1 v2} be bases for vector spaces U, V over Q. Let T 
be the mapping from U to V defined by T(u:) = vi + v2, T(z) = vi — v2, Tus) = 
vo. Let new bases {uf} and {v3} be defined by uf = 2u, — uy + us, a} =n + 
Uz — Us, uf = 2u, — ugand vf = ve, v3 = ¥, + ve. Calculate the matrix of T rela- 
tive to the new base pair, and verify (7.8). 
2. Find new bases for U and V above such that the matrix of T has the form (7.5). 
3. Let {uj, uz, us} be a base for a vector space U over Q, and let T be the endo- 
morphism defined by T(un) =u + us + Us, Ts) =u» — us, Tus) =u — Us 
Let {ui} be the new base defined by uj = 2uy — us, 0} = ur. + ue + Bus, uf = 
Caleulate the matrix of T relative to the new base, and verify (7.4). 


242 Linear transformations and matrices Ch. 9, Sec. 8 


4. Prove that a triangular matrix in K”, is nonsingular if and only if its diagonal 
elements are all nonzero. Show that its inverse must also be triangular and show 
how to calculate it. (Hint: Consider the matrix as defining a change of base in a 
vector space.] 

5. Prove that all nonsingular matrices in K", form a group, with matrix multipli- 
cation as the group operation. Show that the nonsingular triangular matrices form 
a subgroup. 

6. Let {u,}, {v;}, and {wi} be three bases in a vector space. Let p be the matrix 
from {uj to {v|, and let q be the matrix from {v,} to {wi}. Show that pq is the 
matrix from {u,} to {wi. 


8. Rank of a matrix; linear equations; subspaces 


K being a field, consider an m X n matrix a = (ai,) with coefficients in K. The 
columns ay, a, . . . , a, of a are elements of K™, and they span a certain subspace 
CofK". The dimension of C, that is, the maximum number of linearly independent 
column vectors in a, is called the column rank of a. 

Similarly, the rows a', a?, . . . , a" are elements of K,, and they span a certain 
subspace R of K,. The number dim R, that is, the maximum number of linearly 
independent row vectors in a, is called the row rank of a. 


THEOREM 8.1 For any matriz a in K™,, the row rank and column rank are equal. 
Their value is called the rank of a. 
Proof. Let r be the column rank. By its definition there are r linearly 


independent columns in a. Let us call them ai, ..., af, They span C, 
and so each column of a can be expressed as a linear combination of them, 
say 
as = 7 pia; an) 

a 


Then 
at = pia’; 
= 


where a’; is the kth element in the column called aj. Now (p/,) is an 
r+ X nm matrix, and the last equation, written in terms of rows, is 


aha Sate p (k=1,...,m) 

peat 
‘This says that the rows of a are all in the aubspace of K, spanned by the r 
vectors p', p’, , p’. That subspace therefore has dimension <r 
(Theorem 8.1, Chap. 8). Hence we have row rank < column rank. By 


Rank of a matrix; linear equations; subspaces 243 


applying the same argument to the transpose ‘a of a we conclude similarly 
that column rank < row rank. Q.E.D. 


The next theorem connects the notions of rank of a matrix and rank of a linear 
mapping. 


THeoRnEM s2 Lei T: U — V be a linear mapping of iwo vector spaces of finite dimen- 
sion, and lel t be its matrix relative lo some base pair. Then the rank of T is equal to 
the rank of t. 

Proof. Let... ,u, and vi, .. . , vq be the base pair in question, 
so that T(u,) = tiv; Let ect, ..., ¢ be # scalars. Then c's Ttu:) = 
cit/;vy, and this is zero if and only if c'#; = 0 forj = 1, .. . , m, because 
of the linear independence of the v;, and that is so if and only if c’t, = 0. 


Thus c-T(uy) + +> - +e%-T(u,) = 0 if and only if ety +... + 
c't, = 0. It follows readily from this that a certain number of the vectors 
Ta), ..., Tus) are linearly independent if and only if the column 


vectors t; with the same indices are linearly independent, The assertion 
of the theorem is an immediate consequence. Q.E.D. 


THEOREM 8.3 A mairix ain K", is nonsingular if and only if it has rank n, and that 
is so if and only if the mapping of K" to itself that sends an arbitrary vector x into ax 
és an isomorphism. 
Proof. By Proposition 6.2 the mapping A: K* — K* defined by AQ) = 
ax is a linear mapping whose matrix relative to the canonical base is a. 
If a is nonsingular, let b be its inverse matrix, and let B be the mapping 
of K* defined by B(x) = bx. The mapping AB has matrix ab = I, (rela- 
tive to the canonical base) and is therefore the identity mapping. Hence 
A must be one-to-one. Its rank is therefore x, and by the theorem above, 
nis also the rank of a. 
Conversely, if a has rank », then so does A. Hence A is an isomorphism, 
by Theorem 3.2 and its corollary, Let Aq! be its inverse mapping, and 
let b be the matrix of A~' (relative to the canonical base). 


Then the matrix of AA) = lis ab, which must accordingly be the unit 
matrix. Similarly, the matrix of A~'A = I is ba, whence also ba = In. 
QED. 


Let a be any matrix. By a submatrix of a is meant any of the matrices that are 
obtained by striking out given rows or columns of a. 


coronary Le! a be an m X n matrix with coefficients in K, and let r he its rank. 
Then r is equal to the size of the largest nonsingular square submatrix of a. 

Proof. By assumption, some r rows of a are linearly independent. 

Fixing some such set of r rows, let a’ be the r X » submatrix obtained by 

striking out the other m — r rows of a. The matrix a’ has r linearly in- 

dependent rows, and therefore r of its columns are linearly independent, 


2h Linear transformations and matrices Ch. 9, See. 8 


by Theorem 8.1. Fixing such a set of r columns, let a’ be the submatrix of 
a’ obtained by striking out the other n — r columns. Then a” clearly has 
rank r, and it is an r X r submatrix of a; a” is nonsingular, by the preced- 
ing theorem. Conversely, if b is an s X s nonsingular submatrix of a, then 
the rows of b are linearly independent, by the theorem above. The corre- 
sponding rows of a must be linearly independent, a fortiori, showing that 
ahasrank >s. Thatis,r >s. Q@.E.D. 


REMARK. The inverse of a nonsingular matrix is unique, by Theorem 4.3, Chap. 1. 
But we can prove a stronger result. If a and b are elements of K*,, andif ab = I,, 
then a and b are inverses of each other. In fact, this follows at once from the first 
part of the proof above. For then AB = I; hence also BA = I, obviously. The 
matrix of BA is ba, and so ha = I,. 


THEOREM 8.4 Let a be an element of K”,. Then ihe set of vectors x in K* such that 
ax = 0 is a subspace of dimension n — r, where r is the rank of a. The set of all 
vectors ax (x arbitrary in K") is a subspace of K™ of dimension r. 
Proof. Let A: Kn — K™ be the mapping defined by A(x) = ax (Propo- 
sition 6.2). Then the rank of A (= dim Im A) is r, by Theorem 8.2. The 
nullity of A (= dim Ker A) is » — r, by Theorem 3.2. Q.6.D. 


From these theorems we can easily deduce the so-called fundamental theorem on 
linear equations. First of all, 2 system of linear equations 


asl + kage oy 
ant tot a yt 
ta wat +a y 
al fee fame = ym 


with coefficients in a field K is the same as the matrix equation 
82 ax =y 


where a is the m X matrix (a/,) and where x and y are the column vectors (z*) 
and (y') in K" and K™, respectively. Equations (8.1) and (8.2) are also the same 
as the vector equation 


8.3 lap tie es tata, = y 


where, as usual, a; denotes the jth column of a. It is plain that two elements x 
and x’ of K* are solutions of (82) only if a(x — x’) = 0, that is, if and only if 
x — x’ is in the kernel of the linear mapping A: K" — K™ determined by a as in 
Proposition 6.2. Therefore all solutions of (8.2) [or (8.1), (8.3)] are obtained from 
any one of them by adding to it any vector 2 such that az = 0. The solution of 
(8.2), if one exists, is unique if and only if Ker A = 0, which is so if and only if a 
hag rank n, by Theorem 8.4. 


Rank of a matrix; linear equations; subspaces 245 


From (8.3) it is clear that, for given y, a solution x exists if and only if y is in the 
subspace C of K™ spanned by the columns of a. If a has rank 7, then, by definition, 
C has dimension r; and some r of the columns of a form a base for C. For simplicity 
of notation, suppose that a1, az, . » a are linearly independent. Rewrite 
(8.3) as 


a4 Dates + +a, =y —- @ tay + +++ +2%8,) 

Since a,,1, . . . , a, are in the subspace C of K™ spanned by as, . . - , a, it follows 
that the right-hand side of this equation is in C if and only if yisinC. I that is so, 
then for arbitrarily given elements z™*', . . . , x"in K there areelementsx',..., 
x in K for which (8.4) holds. The latter set is uniquely determined, because 
a, ..., a forma base for C. We have proved 


THEOREM a5 Let r be the rank of the matrix a of the system (8.1). Then the system 
has a solution x for given y if and only if y is in the r-dimensional subspace of K” 
spanned by the columns of a. If that is s0, then some n — r of the x' can be prescribed 
arbitrarily in K, and the remaining unknowns are then uniquely determined. AU 
solutions of (8.1) are obtained from any one of them by adding to if an arbitrary 
vector z such that az = 0. 

In Sec. 9 we shall develop an efficient method for solving systems of the type 
(8.1). 


THeorem 6.6 Let wi, w?, .. . , w' be veclors in K,, and let V be the set of all vectors 
x in K” such thatt wx = Oforj =1,...,8. Then V isan (n — 1)-dimensional 
subspace of K", where r is the rank of the matrix w whose rows are w', ... , W’. 
Conversely, if V' is a q-dimensional subspace of K, then there exist n — q linearly 
independent vectors y', ... , y"~? in K, such that V’ consists of all x for which 
yx =0G =1,...,-0) 

Proof. Clearly the equations w'x = 0,..., wx = 0 are equivalent 
to the matrix equation wx = 0, where w is the s X n matrix whose rows 
are w', ..., w'. Hence V consists of all x such that wx = 0, and there- 
fore V is an (x — r)-dimensional subspace of K", by Theorem 8.4. To 
prove the remainder of the theorem, let uy, ..., u, be a base for V’. 
By the argument just given, all vectors yin K, such that ym =0,..., 
yu, = 0 form an (x — q)-dimensional subspace W' of K,. If y', - 
y'~+ is any base for W’, then those vectors will have the properties stated, 
by what has already been shown. Q.E.D. 


coronary Let U be an n-dimensional rector space over K, and let wi, . . . . Un be 
a base for U. Let w',... , w*-? be linearly independent vectors in Ky. Then the 
set of all elements x in U whose component rectors x. relative to the base {ui} satisfy 
0 form a q-dimensional subspace of U. Moreover, every 

wee, 


wee = 0,020, WR 
q-dimensional subspace is oblained in this way by a suitable choice of w', - 
t wixisasealar. Cf. Eq. (6.6). 


246 Linear transformations and matrices Ch. 9, Sec, 9 


This follows at once from the theorem above and from the fact that the compo- 
nent mapping x — x, is an isomorphism from U to K" (Theorem 9.2, Chap. 8). 

In particular, an ( — 1)-dimensional subspace V of U is obtained by selecting 
a nonzero vector w in K,. The corresponding V then consists of all x in U such 
that wx, = 0, that is, such that 


wit! bes + we = 0 


This way of determining (n — 1)-dimensional subspaces has already been used in 
connection with (10.15) of Chap. 8. A proof of the corollary is implicit in Exer- 
cise 5, Sec. 10, Chap. 8. 

As an application of Theorem 8.6 we give a criterion for the solvability of (8.1). 


THEOREM 8.7 The system (8.1) has a solution if and only if zy = 0 for every vector 
in Kn such that za = 0. 
Proof. If a solution x exists, and if za = 0, then zy = (ax) = (za)x = 
0, whence the necessity of our condition. Recall that ax = y has a solu- 
tion if and only if y is in the r-dimensional subspace C of K™ spanned by 
the columns of a. Now all vectors z in Ky such that za = 0 form an 
(m — r)-dimensional subspace W of Kn, by Theorem 84. Let w,..., 
wr" bea base for W. Then it is clear that zy = for all zin W if and only 
if wy =0,...,w'y = 0. But all y in K™ satisfying these last equa- 
tions form an r-dimensional subspace of K", by Theorem 8.6, and, as 
pointed out above, that subspace must contain C, hence must coincide 
with C because they both have dimension r, Q.E.D. 


EXERCISES 

1. Fill in the details of the last part of the proof of Theorem 8.2. 

2. Do the same for Theorem 8.7. 

3. Prove that a matrix ain A”, is singular if and only if there is a nonzero ele- 
ment x in K* such that ax = 0. 

4. Let a be a matrix of rank rin K”,. Show that it is possible to strike out 
nm —r columns and m — r rows in ain such 2 way that the remaining r X r matrix 
is nonsingular. 

5. What is the form of an n X » matrix of rank 17 

6. Prove that ann X # matrix of rank ris the sum of rm X » matrices of rank 1. 


9. Reduction to diagonal form 


Let {u:,... , un} and {vs .. . , vx} be bases for vector spaces U and V over a 
field K, and let a be the matrix of a linear mapping T: U — V relative to that 
base pair, so that T(u;) = a/v; We shall now describe a procedure for finding 
new bases for U and V for which the matrix of T has the diagonal form (7.5). This 
will be accomplished by a series of simple base changes of three different types. 


Reduction to diagonal form 247 


It will be convenient to denote bases by single letters, and we shall start off by 
writing A for ju,} and B for {v,;}, We say then that a is the matrix of T relative 


to (A, B). 

The first type of base change that we consider consists simply of renumbering 
the elements in a base. For example, let A’ = [uj,.. . , us} denote the base for 
U defined as follows: uj = us, uf =u, u3 uy... , Us =u, That is, A’ is 


obtained from A by reversing the numbering of u, and us. If a’ denotes the matrix 
of T relative to (A’, B), then T(uj) = a’y,. But T(uj) = Ti) = a/v; Hence, 
the column aj of a’ is equal to the column a of a. Similarly, a} = ay; and all the 
other columns in a‘ are the same asin a. In general, it is easy to see that renumber- 
ing the base elements in U results in a permutation of the corresponding columns 
in the matrix a. Similarly, renumbering the base elements in V is equivalent to a 
permutation of the corresponding rows of a. We shall often use this observation 
in what follows in order to move elements in a matrix to more desirable locations, 

Now take up the second type of base change which we require. Let By = {¥}} 
be the new base for V defined as follows: 


on viey, iff 4k (k = some fixed index) 
, ves ve tay be tah ta ve bo bn 

Here only the kth base element has been changed. Of course y’,..., y*', 
yt!, . . , y™ denote arbitrarily selected scalars, We must first ascertain that 


(9.1) really defines a change of base. But that is trivial, for (9.1) is clearly equiva- 
lent to . 


vow ifgek 


92 
ve = vk — (yi te bai +P bo vn) 


which shows that the v,; can be expressed in terms of the vj. Hence the latter must 
span V. The matrix from {v,} to {vi} is 
93 (Cy 6 Ort Ys Okey os On) 


where the e; are columns of I, and where y is the column vector whose elements 


arey, oy 1, yt, 2 2. yg) The matrix from {vj} to {v,}, that is, from 
By to B, is 

94 (ey. Cet Ys Meets ss Om) 

where y* denotes the column vector whose elements are —y',..., —y*"', 1, 
-y',..., —y". It is easily seen that (9.3) and (9.4) are inverse matrices. 


Let us calculate the matrix of T relative to (A, B,). Substituting from (9.2) in 
Tu) = atw, we get 


Tay) = avi tees fat yi + akivh — yi — ee my Wi 
otha So ava) bal bo bath 


and so 


248 Linear transformations and matrices Ch. 9, See. 9 
Tu) = @i — ylativi to + it — taki + abvh 
+ at — yak too ta — yrahvn 
Denoting the matrix of T relative to (A, B,) by b, we have the equation 
be = al; — yak, G zk 
bk; = ak; 
Hence, for the rows of b, we have 
Wi = ai = lak Geb 
bi = af 
We can therefore state the following rule: 


rm For a change of base in V of the form (9.1), the new matrix of T is obtained 
from the old one by subtracting yi times row k from row j (J # k), leaving 
row k unaltered. 


Such an operation on a matrix is called an elementary row operation. 
REMARK. The matrix (9.4) is obtained from JI,, by the operation of rule (1). 


In the applications which we shall make of the rule above, the elements y‘ will 
be chosen as follows: Suppose that the element a‘, (no summation here!) is not 
zero and put 


25 vp = Ye (jk) 
vi = (1/a%,) - Ton) 
That is, 
vi = (aafatgvr toes + + ab Vaha)va 


hive + (abt abv ob (a /a) in 
Here we have taken y; = a';/a";. For this special ease, rule (I) becomes 


an For a change of base in V of the form (9.5), the new matrix of T is obtained 
from the old one by subtracting a’,/ay times row k from row j (i # &), leaving 
row k unaltered. 


It is clear that in the new matrix all the elements in column k will be zero except 
the one in the (k, k) place, which will still be a*;. By repeated operations of this 
sort we shall obtain a base for V such that the matrix of T has as many zero co- 
efficients as possible. 


We now take up the question of operations on the columns of a matrix. The 
operations will be of the same sort as those just discussed, but they correspond to a 
somewhat different kind of base change—this time in U, rather than V. 

Starting with a base pair A = {u,j and B = {v,}, again let a be the correspond- 
ing matrix of T:U 4 V. Define a new base A: = {uj} in U by 


Reduction to diagonal form 249 


as | Ok (k fined) 
ul = Wy — Ze (2 9 k) 
where 21, . . . 5 ten) Zegn «+ - , Xn are arbitrarily given scalars. The equations 


of (9.6) are equivalent to 


a7 We = Uk 
uy = uy + end GA 


showing that {u{} must apan U and is consequently a base for U. From (9.6) the 
matrix from {u,} to {u‘) is 


ed 


oe x* 


e 


where the e/ are rows of I, and where x* = (—%, ..., —teu 1, —teu ees 
—a,). From (9.7) the matrix from {uj} to {u;} is 


9.9 x 


e 
where x = (ty... ta, 1, ten...» Zn). The transposes of these matrices 
are of the same type as (9.8) and (9.4). It is easily verified that (9.8) and (9.9) are 
inverse matrices. 
Now let us compute the matrix of T relative to (Ai, B). From (9.6) we get 
Toy) = Tem) = ay; 


and, for ¢ ¥ k, 


250 Linear transformations and matrices Ch. 9, See. 9 


Tw) = Tia: — zim) 
= T(a;) — 2,7 (0,) 
= avy — eiaty; 


Letting ¢ denote the matrix of T relative to (Ai, B), we have for the columns of ¢ 
Ce = ay O = ay — Tay fori zk 
We can state the following rule: 


For a base change of the form (9.6) in U, the new matrix for T is obtained from 
the old one by sublracting x, times column k from column i (i = k), leaving 


column k unaltered, 
Such an operation on a matrix is called an elementary column operation. 


Remark. The matrix (9.8) is obtained from the identity matrix I, by the opera- 
tion of rule (II). 

In our applications of rule (II) we shall choose the scalars x, as follows: Suppose 
that the element a*, (no sum here) is not zero, and put 2, = a‘,/a',. That is, we 
take for (9.6) the equations 


us= uy (k fixed) 
w= ur — (at /aky uy Gz 


9.10 


For this special choice rule (II) becomes rule II’) as follows: 


avy For a change of base in U of the form (9.10), the new matrix of T is oblained 
from the old one by subtracting at,/a%, times column k from column i (i # k), 
leaving column k unaltered. 


In the resulting matrix all elements in the £th row will be zero except the element 
in the (k, k) place, which will still be a*,. 


Now let us apply these operations to the matrix a of T relative to the base pair 
(A, B) in order to reduce it to a more agreeable form. We start with elementary 
row operations, that is, rule (I‘). Let r denote the rank of T, hence also the rank of 
a, by Theorem 8.2. If r = 0, then a = 0, and there is little more to be said. We 
suppose then that r > 0. At least one element of a must be different from zero; 
and by suitably renumbering the base elements—that is, by making a suitable 
permutation of rows and columns—we can get a nonzero element into the (1, 1) 
location. Thus we can assume that a4; » 0. 

We make a base change of the form (9.5), taking k = 1: 


vis Vy Gel 
(1 fats) - Tew) 


WW 


v 


Calling this new base B,, it follows from rule (I’) that the matrix b of T relative 
to (A, B81) has the form 


Reduction to diagonal form 251 


Dy eee ee 
Qo weeee 

es (oh = ah) 
Oo « . 


where the *’s denote the elements obtained from a by (I’). By Theorem 8.2, b must 
haverankr, Ifr = 1, then all the rows below the first one must be zero. For they 
must be multiples of the first row, and the only multiple of b' that has zero in the 
first entry is 0- b! = 6, since b', ¥ 0. 

Ifr > 1, then we can repeat the process. For there must be a nonzero element 
below the first row of b, and by suitably renumbering the base elements w, . . . , 
u, and v3, » Ym, if Necessary (not u, or yj), we can put a nonzero element in the 
(2, 2) location without disturbing the first column. Therefore, if r > 1, we can 
assume that 5%: #0. We now choose a new base By = [v’}} in V by setting 

vf = v5 (j #2) 
v= (1/0%) - T(t) 
From (1’) it follows that the matrix of T relative to (A, Bz), call it ¢, has the form 


ey 0 * . 
0 cy * 
e=]O O eevee 
0 0 8 . 
where c!) = b') and c’, = 6%, whence c', and c*; are not zero. Again the «’s indicate 


elements which we do not need to know explicitly. 

The rank of c is the same as the rank of T, therefore is equal tor. Ifr = 2, then 
every row below the second in ¢ must be zero. For the first two rows, e! and ce’ of ¢ 
are obviously linearly independent. If the rank is 2, then all the other rows can be 
expressed as linear combinations xe! + ye? of the first two. But xe! + ye? cannot 
have zero in the first two places, as required, unless x = y = 0. 

Ifr > 2, then there are nonzero elements below the second row of ¢, and we can 
assume that ¢*; # 0, The process can he repeated again, resulting in a new matrix 
for T in which the first two columns are the same as above, the third column having 
all zeros except for c*;. Ifr = 3, then all the rows in this new matrix below the third 
must be zero. If + > 3, we can continue the process, arriving finally at a matrix 
for T of the following form: 


‘pr 0 
Om: 
saa 00 
090 


252 Linear transformations and matrices Ch. 9, See. 9 


with an r X r diagonal matrix in the upper left-hand corner, the elements m, 
pu... , P, being nonzero, and with all rows below the rth equal to zero, The 
matrix (9.11) is of the kind that we have earlier called iriangular, 

By means of some further base changes the elements indicated by » can be made 


zero. Let us now define a new base A, = {u;} in U by 

uj = 

w= ur — (oh/puyu G1) 
where p’, denotes the (j, 1) element of (9.11) (except that pi, .. . , p’ will be 
written also as pi, .-., p,). If BY denotes the final base obtained above for V, 


so that (9.11) is the matrix of T relative to (A, B*), then from rule (II’) it follows 
that the matrix of T relative to (Ay, B*) is the same as (9.11) except that the ele- 
ments in its top row, other than yy, are all zero. 

We can repeat the same process in the second row by defining the new base 
Ay = {uw} in U by 


uf = 


uy = uf — (p/p) > us G #2) 
By (II’), the matrix of T relative to (Az, B*) is the same as the matrix of T relative 
to (A1, BY), save that all the elements in its second row, other than ps, are zero. 
Repeating this same process for rows 8, . . . , r, we arrive finally at a base A* 
for U such that the matrix of T relative to (A*, B*) has the diagonal form 


0 +++ 0 +++ 0 
Ome++0---0 


oaz 


Asa last step we can replace the p; by 1 by one further trivial base change. Namely, 
if A* = {uf} and BF = {y*}, then (9.12) tells us that 


Tiut) = p+ vt @=1...,7) 
Th) = 0 (i>) 

If B** = {v*} is defined by 
vE" = py} Gal... an 
vt =v G>r 


then the matrix of T relative to (A*, B**) consists of the r X r unit matrix in the 
upper left-hand corner, zeros elsewhere. This last step is usually not necessary. 


We carry this out in detail for a mapping T of a five-dimensional space UJ to a 


Reduction to diagonal form 253 


four-dimensional space V, the field of scalars being Q. Suppose then that for a 
given base pair A = {u;} and B = {v;} the mapping T has matrix 


2 0 -4 —12 4 
3 8 -5 -7 -1 
2 4 -3 -5 1 
4-2 7 19-7, 


According to the foregoing, the first reduction step consists of introducing the new 
baset B. = {v;*} defined by 


vit = V5 G #1) 


9.44 . 
vit = (19) + Tim) = }¢- Qu + By. + 2¥5 — dvs) 


By rule (I’) the matrix of T relative to (A, B,) is 


2 0 -4 -12 4 
0 & 1 11-7 
0 4 1 7 -3 
Oo -2 -1 —5 1 


The (2, 2) element here is not zero, and we can repeat the process. However, 
we can spare ourselves some fractions here by interchanging columns 2 and 3. 
That step is scarcely necessary, but we shall do it to illustrate the operation. 
Switching columns 2 and 8 corresponds to reversing the numbering of uw, and us. 
Therefore let Ag = {u;*} be the base for U defined by 


Wt = Uy uy = Uy 
ose ut =u fori 42,8 


Then the matrix of T relative to (As, Ba) is 


2 4 Oo -12 4 
0 1 8 woo-7 
0 1 4 7 -3 
QO -1 —2 —5 1 


947 


Our next step is to apply rule (I’) to the second column here. For that purpose 
we define a new base Ba = {v} in V by 


vP = vi forj #2 

ve = Tuy") = —4v* + v2" + vs" — 4? 
(remember that we have made a base change in U, which is why u.* appears here 
instead of u:). By (1') the matrix of T relative to (A., Bs}, our most recent base 
pair, is 


+ We shall use Greek letters instead of primes to distinguish the various bases, for otherwise 
the symbolism becomes unwieldy. 


254 Linear transformations and matrices 


32 32 —24 


6 6 6, 


Goo 
eeore 
| 
= 
i 
ES 
s 


We now work on the third column. Let B, = {v;"} be the base 


va =v? (Gj 8) 


O78 wr = (34) Tag) = —14- (Bw + Bd — Ave + Bye) 


By (I’) the matrix of T for (Aa, By) is 


8 
1 
4 
0, 


Soon8 
oore 
i 
= 
| 
- 


Ch. 9, See. 9 


We have achieved here the triangular form (9.11), and we conclude from it that 
the rank of T is 3, for the first three rows are obviously linearly independent. We 
point out that if we are interested only in finding the rank, then it is not necessary 


to keep track of the base changes. 


Let. us see how the final base B, in V is related to the original base B. We read 


this off from (9.14), (9.18), (9.20): 


wt =v + 34¥2 + vs — 2¥5 
vot = —dyy — 5v2 — Bvy + T¥, 
vii = —2v. — vs + oy 


vit = ¥4 


Hence the matrix from B to B, is 


1 -4 0 0 
wn a= ( 8 8 2 2 
-2 7 wool 


The base changes B  B, — By — B, are given by certain matrices 


920 vi = aiivy = biwe vit = clvf 


and from (9.14), (9.18), (9.20) they are 


1000 1-4 0 0 
3g 

au auf 3% 100 rp-f? 19 9 
1010 0 1 0 
2001 0 -1 0 1 


Reduction to diagonal form 255 


1 0 -8 ¢ 
_{o 1 -2 0 
“=\o 0 1 0 

0 0 3 1 


These are all instances of (9.14), and their inverses are given by (9.15). Now 
q = abe (see Exercise 6, Sec. 7), and so q-! can be easily computed, since q~! = 
e“b“a—', by the general rule for inverses. q™! can also be obtained by-solving the 
equations preceding (9.22) for ¥1, Ya, Ya, Ya. 

Returning to (9.21), let us now complete the reduction to diagonal form by 
applying (II’) in the manner described in connection with (9.11). 

The first step is to replace the base A, of U by Ag = {u,*} defined as follows: 


uf sas fort #5 
us = us* — 4uy* 


By (II) the effect of this is to subtract four times column 1 of (9.21) from eohumn 5. 
Hence the matrix of T relative to (4a, By) is 


2 0 9 0 9 
926 ou 6 3 1 
0 0 -4 -4 4 
0 0 9 0 29, 


Now replace Ap by Ay = {u.7}, where 


uy = uf i245 
9.27 ay = uf — 3ue 
uy = un — ne 


By (II) the effect of this is to subtract three times column 2 from column 4 and 
one times column 2 from column 5, Hence the matrix of T for (A,, B,) is 


2 0 0 oO @ 

ao 1 0 0 6 
9.28 

0 0 -4 -4 4 

o 6 @ OO @, 


Finally we change A, to A, = {u,“} defined by 


ue = a fori 44,5 
9.29 ue = ay — by? 


ust = ut + uy? 


256 Linear transformations and matrices Ch. 9, See. 9 


By (ID) the matrix of T for (A,, B,) is 


2 0 0 0 

oo1 o 0 6 
9.30 

0 0 -4 0 06 

09 0 0 0 @, 


which is indeed in diagonal form. As pointed out in relation to (9.18), if we re- 
place the base {vj} in V by the new base {2vr, vw, —4v;", ve], then the new 
matrix for T will have the 3 X 3 unit matrix in the upper left-hand corner. We 
shall not bother with this minor point. 

Let us now see what information we can obtain from (9.30). That is the matrix 
of T relative to (4,, B,), and so we have 


Tim") = av" 
Tu") = yer 

TCs") = —Avyt 
Tu’) = Tia) = 0 


a4 


From this it is elear (see Theorem 7.4) that Im T is spanned by vi", v2, va. That 
is, given y in V, the equation T(x} = y has @ solution if and only if y is a linear 
combination of ¥”, w, v.". From the equations preceding (9.22) we see that, in 
terms of the original base B in V, the image of T has the following base: 


vi + 3g + vy — 2w 
9.32 —4v, — Sv. + 3¥, + 7¥, 
—2v. — vs + ba 
From (9.30) we find furthermore that Ker T is spanned by uy and us‘. In order 
to express these in terms of the original base A in U we must find out how the 
bases A and A, are related. The base changes involved were A + A, > Ag > 
A, > A, given by (9.16), (9.25), (9.27), (9.29). By working backward it is easy 
to find A, in terms of A: 


uy’ = yy 
uy = Us 
9.33 ut = Uy 
ud = ~uy — B3u5 + 
ust = —duy + We — Ny + Us 


The matrix from A to A, is therefore 


1 0 0 Q -4 

0 1-1 1 
934 p=/0 1 0 -3 -1 

o 0 06 1 

¢ 0 0 0 


Reduction to diagonal form 257 


The last two elements in (9.33) span Ker T. Denoting (9.13) by a and (9.30) 
by a’, one verifies easily that a’ = q"lap, as required by Theorem 7.2. 


EXERCISES 
1. Compute a“ for the matrix (9.22) and carry out the verification of the 
equation a‘ = q-ap. 
2. Compute the rank of the matrix 


1 0 4° -38 2 


-1 2 8 o 0 


Q 
2 
2 3-2 § -4 1 
2 
6 -2 2 3 1 4 


4, Let a linear mapping T: U — V have matrix 


2 0 -2 4 
1 0 -2 38 
Oo o4 2 01 
6 4 -4 13 
2 4 -2 7 


with respect to a certain base pair {u;}, {v;}. Compute the rank, image, and 
kernel of T. 
4, Given the system of linear equations 


Xi 4 3X2 — Xs + Xs= 7% 
2X, — 2X2 + X,- Xs =" 
xy + 2X,4+2K,4 4X; = ¥, 


bX.+ Xs+ Xi +6Xs = Vs 
determine what conditions must be satisfied by Yi, Ys, Ya, Y, in order that this 
system have a solution for X,,... , Xs. Determine all solutions for Y: = —8, 
¥:=9, ¥s = 12, ¥, = 5. 

5, Let a = (a/,) be an m X n triangular matrix (that is, a/; = 0 if 7 < j) and 
suppose that a',, a°2, . . . , a", are different from zero. Prove that a has rank r if 
and only if every row below the rth is zero. 

6. Compute the rank of 

2 @®4+1 @41" 
—8148 B-1 vw) 
e  t4+4 Pte 
where ¢ is an indeterminate over Q. 

7, Let uw, ... . @, bea base for a vector space U over a field K, and let ¢ = 
(c!,) be a triangular matrix in K”,. Prove that the vectors v; = ¢/,u, form a base 
for U if and only if eye’: - - +e", 0. 


258 Linear transformations and matrices Ch. 9, Sec. 10 


8 Determine al] solutions of the system 
Bx ty —w +32 =3 
2 — dy +3w +2 =3 
2y + dw —2= 3 
the field of scalars being Zs (residue class notations as in Sec. 10, Chap. 2). 

9. Let U be a four-dimensional vector space over the real field, and let A = 
ey, €s, €, €, be a base in U. Let T be the endomorphism of U whose matrix with 
respect to the base A is 

1 8 -2 5 
-1 1 -4 15 
2 4 -1 0 
4 10 -5 10, 
Find bases for the kernel and image of T. 


10. Quotient spaces 


Let U be a vector space over a field K, and let V be a subspace. For any element 
xin V, we denote by 


10.1 x+V 


the set of all elements in U of the form x + v with van element in V. The subset 
(10.1) contains the element x, since x =x +0. The subset (10.1) is called a 
V-coset (or simply, a coset, when the subspace V is fixed in any given discussion). 


10.2 Every element of U is contained in one and only one V-coset. 


For suppose that x is also contained in a coset x‘ + V. Then x = x’ + ¥, with 
vin V. We have x + v = x’ + (¥, + v); and as v runs through all the elements 
of the subspace V, so does v, + v. Hence x + V = x’ + V. From this we con- 
clude that two cosets x + V and x’ + V either have no elements in common, or 
else are identical, the latter being true if and only if x — x’ isin V. 

We can thus describe the coset x + V as the unique V-coset containing x. 

Now let U/V denote the set whose elements are all the different cosets x + V. 
Define addition in U/V by 


19.3 @+V ++ Vee tntV 
Define scalar multiplication by 
10.4 ex +V) =x) +V 


for any scalar c, These definitions make sense (despite the fact that they depend 
apparently on the arbitrary selection of the element x from its coset). For if we 
had expressed the cosets x + V and y + V asx’ + Vand y’ + V, respectively, 
we would have x = x/ + v,and y = y’ ++ ¥, with vj and vz in V. But then 


Quotient spaces 269 


G@ty)tVe@+tuty tw) tV=@ ty) + ity) + 
e@ ty) +¥ 
showing that (4.1) depends only on the cosets and not on the particular elements 
xand y used to represent the cosets. The same is true for (10.4). It is a routine 
matter to verify that U/V with these operations is a vector space over the same 
field of scalars as U. 
The mapping 

S:x3x+V 
ig called the projection of U onto U/V. The definitions (10.3) and (10,4) show that 
the projection 8 is a linear mapping; indeed S is an epimorphism. 

Let T: U + U bea linear mapping such that T sends V into V, In this situa- 
tion, we call V a T-stable subspace. We denote by Ty the restriction of T to V; 
that is, Ty is the mapping of V to V given by 
105 Ty{x) = T(x) 


for any xin V. It is clear that Ty is a linear mapping; it is called the V-part of T. 

It is a simple matter to verify, by the reasoning used above, that the coset 
T(x) + V remains unchanged if we replace x by x + wi, with v, in V, It follows 
at once that 


10.6 Tow: x + V>TX) +V 


is a well-defined mapping of U/V into U/V, From the definitions (10.3) and 
(10.4), one sees at once that Tu,v is a linear mapping. It is called the U/V-part 
of T. The relation of Ty;y to T is best summed up by the relation 


wz TyvyeS=SoT 


which is depicted in the diagram below. 


T 
uU ——U 
s| 1s 
u/vV —— U/V 
Tuy 


That is, the composition of the vertical and horizontal mappings yield a single re~ 
sult via either the upper or the lower circuit. 

We shall now take up the matrix counterpart of the endomorphisms Ty and 
Tu;v. As before, let T be an endomorphism of the vector space U, and let V bea 
T-stable subspace. We assume that U is finite dimensional. 

We select a base B = {e1, e:, ..., en} for U such that e,, @,...,e isa 
base for V. Let a = (a/,) denote the matrix of T with respect to the base 8. Then 


260 Linear transformations and matrices Ch. 9, See. 10 


Tlen = Yate; =...” 


Psi 


The hypothesis that V is T stable implies that 


Tle) = > aie; Gah... 


at 
that is, 
wos ai = 0) fit... srandjartl,...yn 


Thus the matrix a has the block form 


we (Fb) 


where a’ and a” are squarer X rand (n — r) X (w — r) matrices, respectively, 0 is 
an(n — 1) X r matrix of zeros, and bis anr X (x — r) matrix. Let B’ denote the 
subset fei, ..., e] of V. Let B” denote the subset {S(ea),-.., S(en)} 
of U/V. 

It is clear that B’ is a base of V and that a’ is the matrix of the V part Ty with 
respect to B’. 

What about the matrix a’? 

We observe first that B” is a base of U/V. Forife.s1,... .¢, are any scalars 
such that. 


> eS(e) = 6 then 0 = s( #1) 
part) jah 


and consequently >” cje;isin V, the kernel of S. Inasmuch ase, . . . , €-form 
rh 

a base of V, it follows immediately that cr; = +--+ =¢, = 0. Set 

10.9 e’ = S(e) fe=srtil,...,”) 


From Eq, (10.7) we find 


Tevet) = Yo aver 


gerd 
Hence a” is the matrix of Tr;y with respect to the base BY’. 


EXERCISES: 
1. Prove that the sum and product of two matrices in the block form (10.8") 
is again of the same block form. What is the corresponding assertion about the 
V part and U/V part of two endomorphisms? 


Modules 261 


*2. Continue the notation of Sec. 10, Under what condition on the endomorph- 
ism 7 can one find a base B so that the submatrix b in the block form (10.8’) 
consists entirely of zeros, Give an example of an endomorphism whose matrix 
has block form (10.8’} but for no base does its matrix have the form 

(; 0 
: ) 
where a’ and a’ have at least one row. 

3. Let T: U — W be a linear mapping, and let V be the kernel of T. Let S 
denote the projection of U onto UV. Prove that there is a monomorphism T*: 
UV — W such that T* oS = T. 

4. Let V be a subspace of the vectar space U and let S: U > U-V denote the 
projection of U onto U/V. Prove: There exist linear mappings: R’: UsV — U 
and R’: U — V such that 

(a) Se R” =I’, the identity mapping of U.V 

(6) Ro S + Wo R’ = 1, where fis the identity mapping of U and I’; V — Uis 
the mapping » — » of V into U. 

5. Let T be an endomorphism of a vector space U, and let V be a T-stable 
subspace. Prove: There exists a base B with respect to which the matrix of T has 
the block form 


( 0 

0 a” 

a’ and a” being the matrices of T,- and Tr -, respectively, if and only if there is a 
linear mapping R’: U,V — U satisfying 


(a) Se R” =I" the identity mapping of UV 
(b) TOR’ =Tev eS 


11, Modules 


If, in the definition of a vector space over a field K (ef. Sec. 2, Chap. 8), we replace 
the field K by a ring A, all the conditions continue to make sense. The resulting 
system might be called a “vector space over a ring.” However, it is customary 
to use the term a module over a ring A or an A-module. Many, but not all, of the 
theorems for vector spaces carry over to modules. We enumerate in this section 
some of the basic properties of modules, which will be needed in later chapters. 
To suramarize the definition, a module M over a ring A is an additive abelian 

group admitting scalar multiplication by elements of the ring A and satisfying the 
conditions 

(a-b)-m =a-(b-m) for all a,} in A and m in M 

a-(m +n) =a-m-+a-nforallain A and m, xin M 

(a+b) m=a-m+-m for alla, bin A and min M 

1-m = m for all min M 


262 Linear transformations and matrices Ch. 9, Sec. 11 


where 1 is the identity element for multiplication in the ring A. One often denotes 
the sealar product a -m simply by am, where a is an element of A and m an ele- 
ment of M. 
A submodule N of the module M is a non-empty subset such that 
(1) For any x and y in N, x + yis in N. 
(2) For any x in N and a in A, az is in N. 
A homomorphism of an A-module M into an A-module M; is a mapping f such 
that 
(1) f(@ + y) = f(x) + fy) for all x, y in M 
(2) flax) = af(x) for alla in A and xin M 
The kernel of a homomorphism f is the set of elements which f sends into the zero 
element. A homomorphism f is a monomorphism (sending distinct elements into 
distinct elements) if and only if the kernel of f consists of zero alone. A base in 
the A-module M is a subset {mu, m2, .... mn, ...{ such that each element 
a of M can be expressed uniquely as a finite linear combination of base elements: 


m1 2 = am, tam ++: 


each a; being in the ring A. By contrast with vector spaces over fields, not all 
modules have a base, 


exampte1 Let G be an abelian group, the group operation being denoted by +. 
Then we can consider G as a module over the ring of integers Z upon defining scalar 
multiplication as: a-x = the ath multiple of x, for any integer a (cf. Sec. 7F, 
Chap. 2, for the definition of ‘‘multiple”) Equations (7.28) to (7.25), Sec. 7F, 
Chap. 2, assert in effect that G is indeed a module over the ring Z. A submodule is 
merely a subgroup in this case. 

Suppose now that G is a finite abelian group. Then G cannot have a base as 
Z-module. For if m, were a base element, the elements 1 -m,2-+m,3-m, .. + 
a-m, ... would all be distinct—which is impossible, 

Let M be a module over @ ring A, and let N be a submodule. We define the 
quotient module M/N exactly as in the vector-space case of Sec. 10: an element of 
M/N is a coset x + N. Exactly as in the vector-space case, one sees that M/N 
is a module over the ring A. Furthermore, one defines the projection of M onto 
M/N as the mapping « — x + N which assigns to each element x of M the coset 
containing x. The projection of AZ onto MN is seen to be a homomorphism whose 
kernel is N. If T is an endomorphism of M (i.e., a homomorphism of M into M4) 
which sends N into N, we call N a T-siable submodule. One defines the N part 
and M;N part of T by means of Eqs. (10.5) and (10.6). Both Ty and Ty:x are 
endomorphisms and we have 


112 Taw om = neoT (x denotes the projection Af > M/N) 


More generally, let M, and M, be modules over a ring A; let T be a homomorph- 
ism of M, to Mz; and let Ni, Nz be submodules of M,, Mz, respectively, with 
T(N,) C No. Then the formula 


Modules 263 
Te + Ni) = Ta) +Ne 


yields a well-defined homomorphism T” of the quotient module M,/N, into Mz/N»; 
"ia called the M,/No part of T. Its relation to T is given in the diagram: 


T 
M,-——> M2 
usm | i 
Mi/N\-—> M2/Ne 


that is, x, oT =T’ ©, where m, x: denote the projections of My, M. onto the 
corresponding quotient modules. 

The significance of the quotient module M//N can hest be explained by its fol- 
lowing property: 


aa Let f: M — My be a homomorphism and let N be a submodule contained in 
the kernel of f. Then there is a unique homomorphism f"': M /N — Mo such 
that 
f=f"-.x 


where x is the projection of M onto M/N. 


Assertion (11.4) follows immediately from (11.8) with Ne = 0. The assertion 
can be depicted by the diagram 


There is one important special case of (11.4) deserving separate mention, 
namely, when N is the kernel of f and f is an onto-mapping. In this case the kernel 
of the homomorphism f” is seen to reduce to the zero coset alone. Hence f” is a 
monomorphism. Since the image of f” coincides with the image of f, the mapping 
f” is a one-to-one mapping. Hence f” is an isomorphism, To sum up, 


ws {” is an isomorphism of M onto My if f is an epimorphism with kernel N. 


Examete 2 Let Z be the additive group of integers considered as a Z-module. 
Let mZ be the submodule consisting of all the multiples of the fixed positive 
integer m. Then an element of the quotient module is precisely a residue class 
modulo m (cf. Sec. 10, Chap. 2) and the quotient module is the Z-module formed 
from the group Z,, of residue classes modulo m. 

In view of the intimate similarity in the definition of modules and vector spaces, 
many of the theorems proved in Chap. 8 for vector spaces are also valid for mod- 
ules. For example all the results in Secs. 8 to 5, Chap. 8. The results of Sec. 5 


264 Linear transformations and matrices Ch. 9, See. 12 


on the direct sum of submodules are valid. However, the results in Sec. 6, Chap. 2, 
on linear independence are definitely not true for modules in general. 

The analogy between modules and vector spaces can be carried further for those 
special modules which have a base; such modules are called free modules. 


EXAMPLE 3 Let A bea ring, and let A" denote the set of all »-tuples (a, .. . , 
@,) of elements of A. Defining addition and scalar multiplication componentwise 
(ef. (4.1) and (4.2) Chap. 8], we find that A” is a module over the ring A. Let e, 
denote the element of A” whose ith component is 1 and whose other components 
are 0, Then (a, a2, -- + , a.) = bie: + boee + + ~~ + dye, if and only if 6 = 
a,b =a,...,, =a, Hence a, .. - , e, isa base of A” and A” is a free 
A-module. In the special case that n = 1, a submodule of A is called an ideal (ora 
left ideal in case A is not commutative). An ideal of the form Ax with x a fixed 
element in A is called a principal ideal. 

Given 2 homomorphism f: M — N of a free module M into a free module N, 
one can associate with f a matrix exactly as in the case of vector spaces. Ex- 
plicitly, if {xi .. . , Xe{ and {y1, ... , yn} are bases in Af and N, respectively, 
then the matrix of f with respect to these bases is the matrix a = (a’;) given by 


nH f(x:) = Yay @=1,....m) 
1 


In the special case that 17 =N and the above bases are the same, we call a the 


matrix of f with respect to the base [x ... , Xn} 

Let M be a free A-module with base B = (x, ..-, x.}. Then any homo- 
morphism f: M — N of M into a module N (not necessarily free) is determined 
entirely by the effect of the mapping f on the elements of the base x1, . . - Xs. 
For given any element x in M, we can express x as a linear combination x = aim + 

+ +a,%, in one and only one way, and therefore f(x) =aif(m) +--+ - + 
a,f(x) is uniquely determined by f(x), ..., f(x,). Moreover given arbitrary 
elements vi, ..., ¥, in the A-module N, there exists a homomorphism f of W 
to N such that f(m) = 1, ..., f(x.) = yn; namely, define f by 


> aix; > aif(x;) 
a i 


EXERCISES 
1. Let A be a commutative ring. Define an A-module structure on Hom (M, 
N), the set of all homomorphisms of the A-module M into the A-module N. 
2, Let V be a vector space over a field K, and set A = Hom (V, V). Define 
an A-module structure on V. 
3, Let A be a ring and let A’ denote its opposite; that is, its elements are the 
elements of A but the multiplication - in the ring A’ is given by a - 8 = ba for any 


Modules 265 


a,bin A’, Let M, N be A-modules and define scalar multiplication a+ f in Hom 
(M, N) by (a- f)(x) = f(ax) for any a in A and f in Hom (M, N). 

{a) With this definition, does Hom (M, N) become a module over A if A is 
noncommutative? 

(6) Prove that Hom (M, N) becomes a module over the opposite ring A’. 

(ce) Define acalar multiplication in A” by 

ae (a, .. 6, Qn) = (aya, 2. sa) 
Prove that with this scalar product, A” becomes an A’-module. 

4. Prove that every module is isomorphic to a quotient module of a free module. 

5. Let A bearing. We regard A = A’ as an A-module asin Example 3. Prove 
that A is a free A-module. 

6. Let A be a ring. Prove: If every ideal of A is a free submodule having a 
base with one element, then A satisfies the cancellation law. 

7. Let A be a commutative ring. Prove: If every ideal in A is a free A-sub- 
module, then A is an integral domain in which every ideal is principal. 

8. Let A(t] be the ring of polynomials in an indeterminate t with coefficients in 
the field K. Prove that every ideal in At] is principal (cf. Exercise 2, Sec. 3, 
Chap. 6). 

9. Let B be an ideal in the ring A. Prove: In the quotient module A /B, there 
is a unique multiplication such that the projection 7: A + A/B becomes a ring 
homomorphism if and only if xB lies in B for every x in A (that is, B is a so- 
called right ideal as well as a left ideal). The ring A,B is called the quotient ring. 

10. Let A be 2 commutative integral domain in which every ideal is principal. 
Prove that every submodule of a free A-module M is free. (Hint: Consider the 
homomorphism M — A 


Tae; a1 
j 


of the module onto the coefficient of the first base element.] 

11. Let A = K{#) be the ring of polynomials in an indeterminate ¢, and let ¢(t) 
be 2 polynomial in ¢ of degree n. Let B be the principal ideal Ag(t), and x: A > 
A/B denote the projection of A onto the quotient ring AB. Prove: Asa vector 
space over the field K, A/B has the base x(1), a(), x), 6... x(O7). 

(Hint: Consider the remainder of a polynomial after division by q(é).] 


10 
Groups and permutations 


1. Introduction 


Groups have played a basic role in all the preceding chapters, but as parts of 
more complex systems. Here we shall briefly take up the study of groups for them- 
selves in order to point out some of the fundamental facts concerning them. In 
particular, we shall see that the system of integers Z plays a fundamental part in 
the theory of groups. That theory is very extensive, and we shall be able to touch 
on only a few points here. The only part of this chapter required later is Sec. 3. 


2. Basie properties 


We recall from Chap. 1 that 2 group G is a set of elements with an associative 
binary operation, which we shall usually write here in the product notation, with 
the property that G contains an identity element ¢ for the operation and G contains 
the inverse a— of every element a. We have ea = ae = a for every element a in 
G,anda-'a = aa7'! = e, The element ¢ is unique, and the inverse a— is uniquely 
determined by a. 

A subgroup of G is a non-empty subset, say H, such that if a and b are any ele- 
ments of H, then ab, a—, b—' are also elements of H (see Sec. 6, Chap. 1}. It follows 
that H is itself 2 group, and it has the same identity element ¢ as does G. The 
identity element ¢ forms by itself a subgroup of G. If H is a subgroup of G and 
if H’ is a subgroup of H, then H’ is also a subgroup of G. 

A mapping f from a group G to a group G’ is called a homomorphism if f(ab) = 
f(a) - f(b) for any elements a, b of G. That is, if f sends a into a’ and 6 into b’, 
then it must send ab into a’b’. It follows that f(e) = e’ = identity of G’; and if 
f(a) = a’, then f(a) = a’~! (Theorem 7,1, Chap. 1). If f is one-to-one, then it is 
called an isomorphism, and in that case f—', the inverse mapping from G’ to G, is 
also an isomorphism. (If G = G’, then an isomorphism is also called an auto- 
morphism.) 

The group consisting of the system of integers Z with the + operation will occur 
frequently in what follows, and we shall call it the additive group of integers. It 
will be denoted by Z+ in order to distinguish it from Z considered as an integral 
domain. Similarly, for any positive integer m, we denote by Z,,+ the group con- 


Basie properties 267 


sisting of the residue classes of integers modulo m, with addition as the group 
operation (Eq. (10.1), Chap. 2]. 


THEOREM 21 Let H be a subgroup of the additive group of integers Zt. If H #0, 
then there is a unique positive integer d in H such that H consists of all multiples qd 
of dg = 0, £1, £2, ete.). The mapping of Z* to H defined by q —> qd is an iso- 
morphism. 
Proof, If H does not consist of 0 alone, then it must contain positive 
integers. For if @ is any nonzero element of H, then ~—a is also in H, 
and one of the two is positive. Let d be the least positive integer in H 
(Theorem 4.2, Chap. 2). For any integer m the integer md is also in H, 
as follows from a simple induction. Now let be any positive integer in 
H. By the division algorithm (Proposition 9.4, Chap. 2) there exist 
integers q and r such that » = qd +7 and 0 <r <d, By the remark 
above, qd is in H, and therefore so is n — gd = r, since H is a subgroup of 
Z*, Therefore r = 0, by definition of d; and so we have n = qd. The 
rest of the theorem follows easily. 


THEOREM 2.2 Let a be an element of a group G (with group operation written mul- 
tiplicatively). Then_the mapping L+ — G that sends an arbitrary integer n into a” 
is a homomorphism. The set of all the elements a" is an abelian subgroup of G, de- 
noted by (a) and called the cyclic subgroup generated by a. If a” 4 a” whenever 
m xn, then the mapping n a" is an isomorphism of Z+ to (a). In the contrary 
case, there i a unique positive integer d such that at = e and such that a" = e if and 
only if m is divisible by d; and then (a) is isomorphic to Zit. The integer d is called 
the order of a. 
Proof. The mapping » — a” was discussed in detail in Sec. 7E, Chap. 
2, The fact that it is a homomorphism is the assertion of (7.11) of Chap. 2, 
and from the same equationt it follows that (a) is an abelian subgroup of 
G. If a™ a" whenever m =~ n, then the mapping is a one-to-one homo- 
morphism of Z*+ to (a), that is, is an isomorphism. Suppose then that 
a” = a” for two distinct integers m and n. There follows a-"-a™ = 
a-™-a", or e =a"-™; and n — m #0. Now all integers » such that 
a” = ¢ form a subgroup H of Z*, as follows at once from (7.11}, Chap. 2, 
and H cannot consist of 0 alone, as was just shown. By Theorem 2.1, 
# consists of all multiples of a uniquely determined positive integer d. 
That is, a = e if and only if m is a multiple of d—that is, if and only if 
dim. Therefore a” = a” if and only if dln — m, or, in other words, if and 
only if m = » (mod d), From this it is easy to see that (a) is isomorphic 
to Zt. QED. 
REMARK. Observe that in the case just discussed the subgroup (a) consists of the 
d distinct elements e,a,a°, .. . , a?! 


t It is pointed out in Sec. 7E, Chap. 2, that (7.11) holds for all integers, not just for positive 
integers. 


268 Groups and permutations Ch, 10, Sec. 3 


DEFINITION 21 A group G is called a cyclic group if i is isomorphic to one of the 
groups Z,,* for some m, or else to Zt, In the latler case G is called infinite cyclic. 


EXERCISESt 

1, Let f be a homomorphism from a group G to a group G’. Let H be the subset 
of G consisting of all elements of G that are sent by f into the identity element e” 
of G’ (H is called the kernel of f). Further, let H’ be the subset of G’ consisting of 
all elements f(a), where a is an arbitrary element of G (H’ is ealled the image of f). 
Prove that H is a subgroup of G and that H’ is a subgroup of G’. 

2. Let E be an arbitrary set of elements. Let S be the set of all one-to-one map- 
pings of E to itself, Prove that S, with composition of mappings as the binary 
operation, is a group. What is its identity element? What is the inverse of an 
element of S? 

3. Let c be an element of a group G. Prove that the mapping of G to itself that 
sends an arbitrary element x of G into c~'ze is an isomorphism. (‘The operation 
x > c7Ize is called conjugation by ¢; the various isomorphisms of G to G obtained 
in this way are called inner automorphisms of G.) 

4. Let G be a finite group (ie., containing only finitely many elements). Let 
a be an element of G, and let H be the subset consisting of all the elements a, a’, 
a’, a4, etc, Prove that H is a subgroup of G. 

5. Let @ be a group. In G define a new binary operation « by the rule a> 6 = 
ba, the right-hand side denoting the operation in G. Prove that @ with the opera- 
tion « is also a group (called the opposite group of G). 

6. Let f be a homomorphism from a group G to a group G’. Let H be a subgroup 
of G, and let /(H) denote the set of all elements f(x), where z isin H. Prove that 
{(H) is a subgroup of G’, Let K be a subgroup of G’, and let f-"(K) denote the 
set of all elements x in G such that f(x) isin K. Prove that f-"(K) is a subgroup 
of G. 


3. Permutations 


We now turn our attention to some special groups of great importance. First we 
recall that a permutation of a set E is a one-to-one mapping from E to itself. Let 
us write S(Z) for the set of all permutations of 2. Referring to Theorem 6.3, 
Chap. 1, we state the following theorem: 


THeEorem 3.1 For any set EB, all the permutations of E, with composition of map- 
pings as binary operation, form a group S(E). Its identity element e is the identity 
mapping of E; and for any element 3 in S(E) the inverse group element is the inverse 
mapping s-! of E. If E contains n elements, then §(E) contains n! elements. 


+ Some of these have been repeated from the first chapter. 


Permutations 269 


We recall again that composition of mappings is defined as follows:} If s and s” 
are any elements of S(E), then ss’ is the mapping that sends an arbitrary element 
x of B into s(s"(x)). That is, 


34 s3’(z) = 8(3’(x)) 


To verify the last part of the theorem, let x, a2, . . . , a be the elements of 
Ein some fixed order. If we try to define a one-to-one mapping s of E, then there 
are n possible choices for s(m), since it may be any one of the elements x, 2», 

. , %. Making now some choice for s(2:) we see that there remain only  —.1 
choices for s(x»), because s is to be one-to-one, and therefore we cannot allow 
s{a2) to be the same as e(z,). Fixing one of the n — 1 choices for (a2), we have 
only n — 2 choices for s(z2), and so on. The total number of ways in which s can 
be defined is therefore (x — 1)(m — 2) -- - 2-1 = n! (the argument here con- 
tains a concealed induction). 

To give a specific example, let E consist of just three elements, and let us label 
them simply 1, 2, 8. In this case S(E) consists of the 3! = 6 mappings listed 


(33) Gis) 

e= a = 
123 312 
wwe (h28 nu (128 
° 742381 he 138 
(a) C38) 

a= = 
321 132 


These are to be read as follows, taking s: for example: The symbol means that s2 
sends each element of the top row into the element directly beneath it in the 
bottom row. Thus s: sends 1 into 2, 2 into 8, and 3 into 1. That is, s(t) = 2, 
87(2) = 8, (8) = 1. Similarly for the others. Observe that s: could be equally 
well represented by the symbols 


213 231 
ete. 
321 312 
(The particular numbering of the mappings 41, 82, . . . , 8% is of no importance; 
however we reserve the letter ¢ for the identity mapping.) To illustrate the product 


of permutations we have, by (3.1), ss%(1) = s3(s2(1)) = 99(2) = 1; a982(2) = 
83(82(2)) = 93(8) = 85 8980(3) = sa(1) = 2. Thus as is represented by the symbol 


Coe 
132 


} In this section we shall use the notation as’ instead of ss’ for the composition of two 
permutations. 


below: 


270 Groups and permutations Ch. 10, Sec. 8 


and so 8,8; = 8. Similarly, s.33 = 81, 8984 = 22, 8181 = 82, and so forth. Of course 
we have es 
easily verified that the “multiplication table” of this group is the same as Table 6, 
Sec, 6, Chap. 1. 

The notation used above for a set of three elements can be used equally well for a 


set E' containing any finite number of elements, say x. If we think of the elements 


se = 8 for any of the six permutations 8; and 5)’ = &, ete. It is 


of E as labeled in some fixed order by the numbers 1, 2, . . . , 2, and if s is a per- 
mutation of Z, then just as in (3.2) we can represent s by the symbol 
( 1 2 «+. 9 
Be B= 
s(t) (2) --- s(n) 


or by any of the symbols derived from it by reordering the columns. In particular 
the identity permutation e is represented by 


(anc) 
12. n, 
and the inverse s~’ of s above can be represented by 


7 we ( 3(2) ++ a) 
1 2 «es 


It is obvious that if F and £” are two sets of n objects each, then S(Z) and S(E") 
are essentially the same. We can make this more precise as follows: Labeling the 
elements of both F and E’ with the numbers L, 2, . . . , n, we can represent the 
permutations of E and E’ by symbols of the type (3.3). If we associate a permu- 
tation s of E with that permutation s’ of E’ represented by the same symbol, then 
we establish a one-to-one correspondence between S(Z) and S(E’) which is easily 
seen to be an isomorphism. Any such S(#) is called the symmetric group on 
elements. 


We now develop a somewhat different way of describing permutations. 


proposition 3.2 Let ¢ and s' be fwo permutations of a set E, and suppose that no 
element of E is moved by both sand s'. That is, suppose that if s(x) > 2, then s’(2) = 
x, and vice versa, Then s's = ss’. 

The proof is trivial and is left as an exercise. 


DEFINITION 3.1 Léf x1, 22, . . . , #, be distinel elements of a set E. Then the symbol 
(2, 22, . . +, 2) will denote the permutation that sends x, into v2, 2 inlo xa, .- - 5 
x, into z,, and finally x, into x,, leaving all the other elements of E fixed. The permu- 
tation is called a cycle of order r. 

Observe that, according to our definition, the symbols (w1, x ..-» ts 
(ny By ty. 5 tras Orn ty Ty By. . + , Z--2), ete,, all denote the same per- 
mutation. 


Permutations 27h 
exampLe Let E consist of five elements 1, 2, 3, 4,5. Then the symbols (2, 3, 1) 


and G 234 3) both denote the same permutation. From the definitionof 


23145 
the product of two permutations, (2, 3, 1)(4, 1) is the same as the permutation 
( ; ‘ : :). as is easily verified. For example, (4, 1) sends 4 into 1, and 


(2, 3, 1) sends 1 into 2. Therefore (2, 3, 1)(4, 1) sends 4 into 2, ete. 
Now let # be a set of » elements, and let s be a permutation of E. Select an 
element x in # and consider the list of elementst 


as x, 8(x), (x), s(x), 2. . , a(x), av tar), 2. 


obtained by applying s repeatedly. Since E has only » elements, it is clear that. at 
least two of the elements above must be the same, say si(x) = s(x), with j < k. 
Apply s~/ = inverse of s/ to this equation, getting s~'s'(z) = s-is*(x), or x = 
st-i(z), Thus the element z must appear in the list (8.5) more than once. Let r 
be the least positive integer such that s’(z) =x. Then s"t!(x) = s(s"(x)) = s(z) 
and s(x) = s(s"4(x)) = s°(x), and so on; and it follows that (3.5) simply consists 
of the r elements x, a(x), s%(z), . . . , 8°~!(x) repeated over and over again. Let us 
write x for x, x2 for s(x), 2 for s(x), . . . , # for s(x), Then s sends x, into 2», 
X2 into t, ... , Zr. into z,, and finally x, back to 2. The set consisting of x, 
a, .. + ,, is called the s-orbit of x = x, and s permutes these elements cyclically. 
Referring to Definition 3.1, we see that the cycle (m1, 22, . . . , -) has the same 
effect on the r elements as does s. It is clear moreover that all the elements 2, 
wa, ... , £, have the same s-orbit, namely, the set consisting of those r elements. 

Ifr = n, then all the elements of E appear in the cycle (x, a, . . . , 2), andin 
this case ¢ is simply a cyelic permutation of E. Ifr < x, then there is at least one 
element y of E not in the cycle (x, 2, .. - , %)- Repeating the foregoing argu- 
ment for the elements 


%, Sy), Hy), FY), «+ 


we obtain in the same way the s-orbit of y, consisting say of k elements y, = y, 
yo = By), « . . . Ye = 8 "(y), where s(y,) = s*(y) = y. Corresponding to this 
orbit we have a cycle (yi, y2, -. . , yx) of order k. This cycle has precisely the 
same effect on the k elements y:, yz, .. - , yx a8 does s. The two cycles (11, 22, 
- » 2r) and (yy, yz, . . - , ys) cannot have any elements in common. For sup- 
pose that z; = y;, say. That is, s‘"(a) = s/'(y). Applying s‘~!+!' to both sides 
we get stti-i(z) = s(y). But s*(y) = y, from above, and so st’(x) = y. Thus 
y isin the s-orbit of z, a contradiction. 
If the two cycles (x1, v2, . . . , tr) and (th, ¥, . . - » ¥) do not contain all the 
elements of E (thatis, ifr + k < 2), then we can continue the process. But it must 
obviously stop after at most n steps. We arrive finally at a decomposition of EF into 


+ Recall that s? = ss, 8? = sss, ete. See Sec. TE, Chap. 2. 


272 Groups and permutations Ch, 10, See. 3 


a certain number of s-orbits, with corresponding cycles (x, 22... , 2), QW» 
Yo ov Vey oe +s Cy W.-W). As just observed, no two of the orbits, 
and thus no two of the cycles, have any elements in common. For this reason we 
say that they are disjoint. The mapping s produces a cyclic permutation in each 
orbit, represented by the corresponding eyele, and therefore s cannot send an 
element of one orbit into another orbit. It follows readily that s is the product of 
the cycles corresponding to its orbits, 


B= By Bey ee He Wy ee sw) 
We have therefore the following theorem: 


THEOREM 3.3 E being a finite set, any permutation s of B can be expressed as a product 
of disjoint cycles. The elements of the cycles form the s-orbits in E, and that repre- 
sentation of s is consequently unique, apart from the order of the factors. 

Note that since the cycles are disjoint the order in which they appear is immate- 
rial, by Proposition 3.2. A eycle of order 1 stands for the identity permutation, and 
therefore such cycles can simply be omitted. 


123 45 67 8 
- 
staegad symm 


elements. Let us compute the s-orbit of, say, the element 1. The mapping sends 
18,8 —3,3— 2,2 7,7 -1, and so the s-orbit of 1 consists of 1, 8, 3, 2, 7. 
That set of five elements is also the s-orbit of 8 8, 2, and 7, plainly. The s-orbit of 4 
consists of the elements 4 and 6; the s-orbit of 5 consists of the single element 5. 
That exhausts the set of eight elements, and we haves = (1, 8, 3,2, 7)(4,6)(5). As 
remarked above, the cycle (5) represents the identity permutation, and so we can 
also write s = (1, 8, 3, 2, 7)(4, 6), or s = (4, 6)(1, 8, 3, 2, 7). 


ExampLe Consider the permutation s -( 


DEFINITION 3.2 Let s be a permutation of a finite sel E, and suppose that 8 can be 
expressed as a product of h disjoint eycles, of orders my, rm... . Tay respectively. 
Then the number I(s) = (r) — 1) + (2 — 1) + ++ + + (ry — 1) te called the index 
of s. If I(s) is even, then s is said to be an even permutation; if I(s) is odd, then s iz 
said to be odd. 

In this definition it is obviously immaterial whether cycles of order 1 are included 
or not. Since the decomposition of s into a product of disjoint cycles is unique, 
apart from order and from cycles of order 1, it follows that the index 7(s) is uniquely 
determined by s. In the example above, the permutation (4, 6)(1, 8, 3, 2, 7) has 
index 1 + 4 = 5 and is therefore odd. The identity permutation always has index 
zero and is therefore even. 


THEorem 3.4 Any permutation s of a finite set E can be expressed as a product of 
cycles of order 2 (not necessarily disjoint and not unique). 
Proof. Consider acycle (x, . .. ,2,). It follows from the definition of 
composition of mappings that 


Permutations 273 


(Bt 6 Be) = Bry BBs, Hr) + (Hs, 1) (Bay T1) 


‘Therefore the assertion is true for cycles. Now if we express an arbitrary 
permutation s of E as a product of disjoint cycles and in turn express 
each of those cycles as a product of cycles of order 2, then we obtain 
an expression for s as a product of cycles of order 2. Q.5.D. 


A cycle of order 2 is sometimes called a transposition because it simply inter- 
changes two elements of E. The main results of this section are Theorem 3.4 and 
the following theorem: 


THEOREM 2.8 Let E bea finite set, and let ty, ts, . . . , tn be transpositions of E, that 
4s, cycles of order 2. Then the permutation s = tity - - + ty of B is even if m is even 
and is odd if m is odd. 


36 


BT 


Proof. This can be shown directly by induction on m. However we 
give a somewhat different proof. Let x be the number of elements in E. 
If we replace the elements of EZ by the integers 1,2, . . . , nin some way, 
then every permutation s of E can be regarded as a permutation of 
1,...,%. As usual, the effect of a permutation s on an integer j in that 
set will be denoted by 8(j), so that s can be represented by the symbol (3.3). 

Now consider the ring A = Z(x, . . . , 2.) of polynomials in n inde- 
pendent variables over the integers Z (see Sec. 5, Chap. 6). For any per- 
mutation s of 1, . . . , x there is a unique isomorphism of A to itself—we 
call it s also—sending x; into xa forj = 1, ... , » (Theorem 5.2, Corol- 
lary, Chap. 6). Denote simply by sf the image of a polynomial f under 
this isomorphism. That is, sf is the polynomial defined by sf(m, ... , 
2a) =F(Xeays - « « » Zany), Obtained by permuting the variables in f aceord- 
ing tos. In particular we have sx, = 24 for the monomial x, If s’ is 
another permutation of 1,..., n, then s'(sx,} = 8'%aj = Zen) = 
Zeajy and so 8'(sx;) = (s’s)z, From this it follows easily that, for any 
polynomial f in A, 
s‘(sf) = (s's)f 
Consider now the special polynomial 
he =T] Gi - 2) = — mG - om) + Gea — oy) 

ig 
For example, for n = 8 we have h = (2: — 22)(x1 — %3}(@2 — 23). Now 
if Lis any transposition of two integers in 1, .. . , », it is easily seen that 
th = —h, Therefore, if 4, ..., fm are all transpositions, then 
(iy «+ + ty)Jh = (—1)"4, by repeated applications of (3.6). Hence, to 
prove the theorem, we merely have to show that sh = & if s is an even 


permutation and sk = —A if s is an odd permutation. 
Write the arbitrary permutation s of 1,..., as a product s = 


274 Groups and permutations Ch. 10, See. 3 


8182 - - + & of disjoint cycles. If s; is the eycle (fi, . . . , j,), say, then s, 
can be expressed as a product of r — 1 transpositions, as in the proof of 
Theorem 3.4, From the remarks above it follows that yA = (—1)"1h, 
and r —1 is precisely the index of s. By (3.6) we have sh = 


(81 - + + sada) = (Ly - - + a.1)k. By repeated applications 
of the argument we arrive at skh = (—1)', where b is the sum of the indices 
T(s)) + +++ +1(s). By Definition 3.2 this number is the index of s. 


Hence we have sh = (—1)"k, which proves the theorem. @.E.D. 


theorem 36 Let sand s! be two permutations of a finite set EB. Then their product 
se’ is even if s and 8’ are both even or both odd; otherwise the product is an odd permu- 
tation. The even permutations in S(E) form a subgroupt A(E) containing n1/2 
elements, if n is the number of elements in B. 

This follows immediately from Theorems 3.4 and 8.5. 


EXERCISES 
1. Let & be a set of n elements, and let E, and E; be two subsets, each containing 
r elements. Determine the number of permutations of EF which send E, into Ey. 
2. Write each of the following permutations as a product of disjoint cycles. 
Determine the indices of the permutations, and express each one as a product of 
transpositions: 


1234567 7654321 
(a) - (b) - 

7654321 4312675 
@ (123456 789 10 @ (288914765 

8421361097 5 189675423 


3. Write the following permutations of a set of eight elements in the form (3.3), 
and determine whether they are odd or even. Write them also as products of 


transpositions. 
(a) (L, 8, 5)6, 4, 8, 206, 6 7, 8) (>) (4, L 2, 8, 4, 1, 2, 3) 
(e) G, 2)(2, 491, 7)(7, 6, 8) (a) (5, 4, 8 6, 7, 1, 3, 2) 


4, Let E be a set of n elements, and let E” be a subset of r elements. How many 
permutations of E send each element of E’ into itself? Show that all such permu- 
tations form a subgroup of S(E). 

5. With E and E’ as ahove, how many permutations of E map £’ into itself? 
Show that all such permutations form a subgroup of S(E). 

"6, Let E be a finite set, and let G be a transitive group of permutations of E 
generated by transpositions. That is, let G be a subgroup of S(Z) such that (1) for 
any pair of elements z, y in E there is an element of G which sends z into y and 
(2) every element of G can be expressed as a product of transpositions which are 
themselves elements of G. Prove that G = S(E). 

7. Consider permutations of 1, 2,..., »: Assuming i <j, show that the 


+ A(E) is called the alternating group on n elements. 


Subgroups and quotient groups 27s 


transposition (7, 7} can be expressed as a product of transpositions of adjacent 
elements of 7, +1, ... ,3, necessarily an odd number of them. 


4. Subgroups and quotient groups 


We now return to a discussion of arbitrary groups. The definition of quotient 
groups given below is very closely connected with the definition of quotient spaces 
in Sec. 10, Chap. 9. We begin by defining two operations on the subsets of a group. 


DEFINITION 4.1 Let G be a group (multiplicative notation), and let A be a non-empty 
subset of G. Then A~ will denote the set consisting of all the inverses of the elements 
in A. If Bisa second non-empty subset, then AB will denote the set of all products ab, 
where a is an element of A and b is an element of B. 

From Definition 4.1 it is trivial to verify the rules 


4a (AB}> = BOA4 A(BC) = (AB)C 
for subsets of G. The following fact is useful: 
42 HH =H and 1 =H — if and only if H is a subgroup of G 


If these equalities hold, and if a, 6 are elements of H, then from the first equality 
it follows that ab is in H, and from the second equality it follows that a—! and b—' 
are in H. Hence H is a subgroup of G. The converse is equally trivial to verify. 

It H, K are subgroups of G, the set HK is not in general a subgroup. However 
we have 


43 If H, K are subgroups of G such that HK = KH, then HK is also a subgroup 
of G 


For (HK)! = K-'H~, and this is equal to KH, by (4.2), hence is equal to HK, 
by assumption. Thus (HK)-' = HK. Similarly, (HK)(HK) = H(KH)K = 
HHKK = HK. Hence HK satisfies (4.2). 


DEFINITION 42 Lei H be a subgroup of a group G. Then any set of the type aH (aan 
element of G) is called a left coset of H; any set of the type Ha is called a right coset 
of H. 

According to Definition 4.1 above, the left coset aH consists of all the elements 
az, where x is an arbitrary element of H. Similarly, Ha consists of all elements za. 
Since H contains the identity element ¢, it is clear that af and Ha both contain a. 
Hence every element of G is contained in a left coset of H and in a right coset of H. 
It is plain that eH = H and He = H, and so H itself appears among the cosets; 
aH = Ha if the group G is commutative. 

ReMARK. In the case of an abelian group with the additive notation, we naturally 


write A + B instead of AB in Definition 4.1. This operation among subsets of 
G is then commutative. If A, B are subgroups of G, then so is A + B. H being 


276 Groups and permutations Ch. 10, See. 4 


a subgroup of G, its (left) cosets are written a + H instead of aH, and clearly 
at+H=Hta. 

eExampLe 1 Let m be a positive integer, and let H be the subgroup of the additive 
group of integers Z* consisting of all multiples 0, 4:m, 2m, 43m, ete. of m. Ifa 
is any integer, then the coset a + H consists of all the integers a + km (k an arbi- 
trary integer). In other words, a + H consists of all integers which are congruent 
to a modulo m; that is, a + H is the residue class modulo m that contains a, 


THEOREM 4.1 Let H be a subgroup of a group G, and let e be an element of G. Then 
cH = H if and only if c is an element of H. Similarly, He = H if and only if cisin H. 
Proof. ce = is an element of cH, hence must be an element of H if 

cH = H. Conversely, if ¢ is in H and if « is in H, then ex is in H, since H 

is a subgroup, and so ¢H is certainly a subset of H. But, given any y in 

H, there is an element x in H such that ex = y, namely x = ¢'y. There- 

fore cH must be identical with H. The proof for He is quite similar. 


QED. 


corottary Two elements a, b of G are in the same left coset of H if and only if a-b 
isin H. Every element of G is in one and only one left coset of H. 
Proof. Suppose that a and 6 are both in the left coset cH. Then, by 
definition of the latter, there are elements x, y in H such that a = cz and 
b = cy. From this there follows a-! = x—c7!, whence a“'b = which 
is an element of H, since H is a subgroup of G. Conversely, if a—'bis in H, 
then a-'bH = H, by the theorem above. Multiplying by a we get bH = 
aH, But 6H contains be = 6, and aH contains a; 2 and b are in the same 
left coset. Every element of G is in some left coset, since a is in aH. Sup- 
pose now that a is in another left coset bH. The latter contains 6, and 
therefore a—'bis in H, andsoaH = 6H, as we have just seen; the two cosets 
are identical, @.5.D. 


REMARK. The corollary holds equally well for right cosets, save that a-'d in the 
statement must be replaced by ab-', In order to minimize repetition, we shall 
generally state and prove things for left cosets. 

The corollary says that the subgroup H determines a decomposition of G into 
disjoint (.e., nonoverlapping) subsets, namely, the left cosets of H. The number 
of distinct left cosets of H in G is called the index of Hin G. This is the same as 
the number of distinct right cosets, as is easily seen, For (@H)~' = H-'a7! = 
Ha~, by (4.1) and (4.2). That is, the set of inverses of elements in aH is precisely 


the right coset Ha-', In this way there is established a one-to-one correspondence 
between left and right cosets of H. 


ExAmpLe 2 Let H be the subgroup of Z* defined in Example 1, The cosets of H 
(left or right) are the residue classes of integers modulo m, and in Sec. 10, Chap. 2, 
it was shown that there are m of them. That is, the index of H in Z+ is m. 


Subgroups and quotient groups ary 


It will be convenient to use the symbol #E to denote the number of elements in 
an arbitrary (finite) set Z. A group G is called finite if it consists of only a finite 
number of elements, and that number 4G is called the order of G. If H is a sub- 
group of G, then #(aH) = 4H for anyainG, That is, any left coset of H contains 
the same number of elements as H. For the mapping H — aH defined by x — ax 
is a one-to-one mapping. Assuming that G is finite, put n = #G, m = #H, and 
let i denote the index of H in G (the number of distinet left cosets). From the 
corollary above we clearly have n = mi, for the left cosets of H decompose G into i 
disjoint sets of m elements each. We state this important result as 


THEorEM 4.2 (Lagrange) Let G be a group of order n, and let H be a subgroup of 
order m. Let i be the index of H in@. Thenn = m-i. In particular, m divides n. 
The order of any element a in G divides the order n of G, and a" =e, 

We recall from Theorem 2.2 that the order of an element a is the least positive 
integer d such that a? = e. In other words, d is the number of elements in the 
cyclic subgroup (a) generated by a. By what has been shown, d must divide n, 
say n = dk, Then a” = a# = (a%}t =e, 


corotary Let G be a group whose order is a prime p. Then G is a cyclic group, 
and it is generated by any of tis elements a other than the identity. That ts, G consists 
of the powers ¢,a,a7,...,a?—', In particular, G is abelian. 
Proof. The element a generates a cyclic subgroup (a), by Theorem 2.2. 
Since a # e, the order of this subgroup must be p, by the theorem above, 
and soG = (a). @.E.D. 


Let H be a subgroup of a group G, and consider the set L whose elements are the 
various left cosets of H. Under certain cireumstances L, with the operation of 
Definition 4.1, is itself a group. Now H itself appears among the left cosets of H, 
and so H is an element of L. Since HH = H, by (4.2), it follows that if L does 
happen to be a group with the operation of Definition 4.1, then H must be the 
identity element of L. Then H(cH) = cH, for any left coset cH. Now the set 
standing on the left of this equation must contain the subset Hee = He. Therefore, 
He is contained in cH, By the same argument applied to e'H, we find that He 
is contained in cH, Consequently (He~!)~' is contained in (¢7H)~!, That is, 
cH is contained in He [using (4.1) and (4.2)]. Therefore, if L is a group, then 
eH = He for any element ¢ in G. 


DEFINITION 43 A subgroup H of a group G is called a normal (or invariant) sub- 
group if cH = He for every c in G, or what is the same thing, if H = c~'He. 


Remark, Let c be a fixed element of G. The mapping G — G that sends an 
arbitrary element z into c2e is an isomorphism of G to itself (ef. Exercise 3, 
Sec. 2). Isomorphisms obtained in this way are called ivner automorphisms of G. 
According to Definition 4.3, a subgroup H is a normal subgroup if and only if H 
is mapped to itself by every inner automorphism of G. 


278 Groups and permutations Ch. 10, See. & 


The mapping x — c~'ze takes any subgroup H into a new subgroupe"He. Any 
such subgroup is called a conjugate of H. (Taking c = e, we see that H is included 
amongst its conjugates.) Replacing c by e~! we get a new conjugate cHe~'. Thus 
we can write the conjugates of H either in the form ¢—'He or elsecHe—'. (However, 
the two conjugates indicated here are different, in general.) 

Every group G has at least two normal subgroups, namely, G itself and the 
subgroup (e) consisting of the identity element. Every subgroup of an abelian 
group is a normal subgroup. 

A normal subgroup H of G is simply one for which the left and right cosets are 
identical, and in this situation we can speak simply of the cosets of H, without 
specifying left or right. 


THEOREM 4.3 Lei G be a group, and let H be a subgroup. If the left cosets of H forma 
group with the operation of Definition 4.1, then H is a normal subgroup of G. Con- 
versely, if H is a normal subgroup, then the cosets of H form a group with the binary 
operation of Definition 4.1. That group is called the quotient group (or factor group) 
of G by H and is denoted by G/H. The product (aH) (bH) of two elements of G/H is the 
coset abH; the inverse of aH is a-'H; the identity element of G/H ts the coset H. The 
mapping G > G/H defined by a aH ts a homomorphism (called the canonical 
homomorphism). Its kernel} is H. 

The first assertion has already been proved. If H is a normal subgroup, then 
(aH) (0H) = a(Hb)H = a(bH)H = abHH = abH, as claimed, by (4.2). The other 
verifications are entirely routine. For example, to show that A is the identity 
element of G/H, we have H (aH) = (Ha)H = (@H)H = aHH = aH, and 
(a@H)-H = aHH = aH. To show that a-'H is the inverse of aH, (a-'H) (@H) = 
a-(Ha)H = a7(aH)H = eHH = H,and by the same reasoning (aH)(a~'H) = H. 


THEOREM 4.4 Lei f be a homomorphism from a group G to a group G", and let H be 
the kernelt of f. Then H is a normal subgroup of G, and all the elements in any coset 
of H are mapped by f into the same element of G’.. The mapping }: G/H — G' defined 
by JlaH) = f(a) is a homomorphism and in fact is an isomorphism between G/H and 
S(G):t } is called the mapping induced by f. 
Proof. By definition, x is in H if and only if f(z) =e! = identity of 
G’, Let ¢ be any element of G, and let x be any element of H. Then 
S(ertxe) = fe File) = fe) - e"« fle) = See) = e’, since fe) = 
fe). Thus e“ze is also in H. This shows that He is contained in H. 
Replacing ¢ by c7 in this argument, we find similarly that cHe7! is con- 
tained in H; henee e~'(cHe~)e is contained in e—'He; that is, H is contained 
in ele, Therefore H = c-'He; showing that H is a normal subgroup 
of G. 
Now if a, b are in the same coset of H, then a~'b is in H, by the corollary 
+ See Exercise 1, Sec. 2. 


Subgroups and quotient groups 279 


of Theorem 4.1. Thus f(a-'b) = e’, andsof(b) = f(aa—b) = fla)fia-'b) = 
f(a)-e' = f(a). Therefore, f maps all the elements in any coset of H into 
the same element of G’, and consequently the definition of f: G/H — G’ is 
consistent, for if aH = 6H, then f(a) = f(b). It is trivial to verify that 
f has the properties stated. Q.B.D. 


THEOREM 45 Let f be a homomorphism of a group G onto a group G (that is, {(@) = 
G), Then f maps any normal subgroup of G onto a normal subgroup of G’. 
Proof. Let K be a normal subgroup of G, and let K’ consist of all 
elements f(a), with a in K. Ifa, b are in K, so are ab, a, b-'., Hence 
Slab) = fla) - f(b) and f(a“) = f(a), f() = FG)" are also in K', show- 
ing that K’ is a subgroup of G’. Now take any c’inG’. We want to show 
that ¢’-!. K’-e’ = K’, By assumption there is some ¢ in G such that 
fe) =’. For any a in K the element e~'ac is in K, and so f(e7'ae) = 
Sley-Yayfe) = e+ fla) -e’ isin KQED. 


The following two theorems are usually called the first and second isomorphism 
theorems, respectively. 


tHEoREM 46 Let H, K be subgroups of a group G, and suppose that K is a normal 
subgroup. Lel f: G—G/K be the canonical homomorphism. Then f maps H onto 
HK/K;, and if h denotes the mapping f restricted to elements of H, then the kernel of h 
is HV K, the set of elements common to both H and K. HO K isa normal subgroup 
of H, and the mapping hi: H/H\ K + G/K induced by h maps H/H ( K iso- 
morphicaily onto HK /K. 

Proof. HK = KH, since K isa normal subgroup of G. Therefore HK 
is a subgroup of G, by (4.8), and it clearly contains K, since H contains e. 
Thus K is a subgroup of HK, and it is a normal subgroup, sincec'Ke = K 
for any c in G, hence a fortiori for any ¢ in HK. 

Any element in HK can be expressed in the form ac, with ain H and cin 
K. For the coset we have (ac). K = a(cK) = aK, by Theorem 4.1. 
Hence every coset of K in HK can be written in the form aK, with ain H. 
Now the canonical homomorphism f maps any @ in G into the coset aK. 
In particular, this is so for elements a in H, and so f maps H onto the 
system of all cosets aK (a in H). As we have just seen, this is precisely 
HK/K. 

Denoting by h the mapping H — G/K obtained by confining f to ele- 
ments of H, we have h(a) = aK for any a in H. And, as we have just 
seen, 4 maps H onto the subgroup HK/K of G/K. That is, k(H) = 
HK/K. The kernel of k consists of all elements of H which are mapped. 
by h into the identity element of GK. Now that identity element is the 
coset K. Hence, the kernel of h consists of all ain H such that h(a) = K, 


280 Groups and permutations Ch. 10, See. 4 


that is, such that aK = K. By Theorem 4.1 that is so if and only if ais 
in K. But a is also in H, and therefore the kernel of & is the set HK 
of all elements common to both H and K. Then HO K isa normal sub- 
group of H, by Theorem 4.4; and by that same theorem (applied to 4), 
the induced mapping k: H/H (\ K — G/K maps the first group isomor- 
phically onto k(H) = HK/K. Qed. 


Observe that / simply maps a coset. a(H (7 K) into the coset aK. 


THeorem 4.7 Let K be a normal subgroup of a group G, and let f: G > & be the 
canonical homomorphism of G onto the quotient group G = G/K. Let H be a normal 
subgroup of G, and denote by H the set of all elements in G which are mapped by f 
into H. Then H is a normal aubgroup of G, containing K, and the groups G/H and 
G/H are isomorphic in a natural way. 

Proof. Let g: G— G/H he the canonical homomorphism. Then the 
composition gf is a homomorphism G -+ G/H. The kernel of Gf consists 
of all z in G such that 9f(2) = H = identity of G/H. Now by definition, 
B(x) = Gf@)) = f(z) - A, and this is equal to A if and only if f(x) is in 
A, which is so if and only if z is in H, by definition of H. Therefore H is 
the kernel of gf and accordingly is 2 normal subgroup of G (Theorem 4.4). 
The mapping G/H — G/H induced by Gf is an isomorphism, by Theo- 
rem 44. Q.E.D. 


EXERCISES 

1, Let f: G — G’ be a homomorphism of groups, and let H’ be a normal sub- 
group of G’. Let H be the set of all x in G such that f(z) isin H’. Prove that H is 
a normal subgroup of G. Prove that G/H is isomorphic to a subgroup of G’/H’. 

2. Let H be a normal subgroup of a group G, and let H’ be a normal subgroup 
of a group G’. Let f: G —G’ be a homomorphism such that f(x) is in H’ when- 
ever xisin H. Let g: G—G/H and g’: G + G'/H’ be the canonical homomor- 
phisms. Prove that there is one and only one homomorphism f: G/H — G’/H’ 
such that fg = gf. (This f is called the homomorphism induced by f.) 

*3, Let G be a group. By a commutator of G is meant an element of the type 
aba b— (a, b in G, naturally). Let K denote the set of all elements in G which 
can be expressed as products of commutators. Prove that K is a normal subgroup 
of G and that GK is abelian. |Hint: Show that conjugation « — e—'xe by an 
element ¢ sends a commutator into a commutator.) Prove that if f: G—G’ isa 
homomorphism of G to an abelian group, then the kerne] of f must contain K. 
(K is called the commutator subgroup of G.) 

4. Let H be a subgroup of a group G. Let ZL consist of all x in G such that 
tH = Hx. Show that L is a subgroup of G and that H is a normal subgroup of L. 
Show that if K is any subgroup of G containing H and that if H is normal in K, 
then K is a subgroup of L. (ZL is called the normatizer of H in G.) 


Subgroups and quotient groups 281 


5. G being a group, let C denote the subset consisting of all elements ¢ such 
that ex = ze for all «in G. Show that € is a normal subgroup of G (called the 
center of G). 

6. Let S, denote the symmetric group of 2 elements, and let A, denote the 
alternating subgroup (that is, consisting of all even permutations; ef. Sec. 3). 
Show that A, is a normal subgroup of S, and that S,/A, is a cyclic group of 
order 2. 

7. Let H be a subgroup of index 2 in a group G. Prove that H is a normal 
subgroup. 

8 Referring to Table 5, Sec. 6, Chap. 1, show that the subgroups {p, 8], 
{p, t, {} are normal subgroups. Write out the cosets for both of them, and com- 
pute the multiplication tables for the corresponding quotient groups. 

9. Let G be a group of prime order (that is, 4G is a prime number). Show that 
G is abelian and that its only subgroups are G itself and the identity subgroup. 

10. Let G be a group of order p, where pis a prime. Show that any subgroup 
has order p? for some 8 < a. 

11, Let G be a group of order mk, and let H be a subgroup of order m. For any 
element a in G show that there is an integer h, with 1 < hk < k, such that a“ isin H. 
If H is normal, prove that a* is in H for any a. [Hint: Consider the cosets H, 
aH, @H, ete.) 

12. Let H be a subgroup of a group G, and let L be the normalizer of H (see 
Exercise 4). Prove that the number of distinct conjugates of H is equal to #H/#L. 

*13, Let E be a euclidean space (Sec. 11, Chap. 8), and let G be the set of all 
distance-preserving mappings f: E > E. That is, the distance between f(p) and 
4(q) must be equal to the distance between p and q for any two points of E. Prove 
that G with composition of mappings as binary oper: tion isa group. Prove that 
an element f of E is a translation if and only if f(p)f(@) = pq for any two points 
p,q of E. Deduce that the set of all translations T of & is a normal subgroup 
of G. Prove that the quotient group G,'T is isomorphic to the subgroup G of G 
consisting of all f which leave a specified point p, fixed, that is, which map p, to 
itself. (Show that each coset of 7’ contains one and only one mapping in Gy.) 

14, In many primitive societies, the rules governing allowed marriages are quite 
complicated. As we will see, algebraic formalism may simplify the study of these 
marriage laws.{ Suppose that, in a given society, there are n types of marriage, 
denoted by M@,, Mz, .. . , M,, subject to the following conditions: 

(1) There is one and only one type of marriage allowed any person. 

(2) The type of marriage allowed any person depends uniquely on his sex and. 
on the type of marriage of his parents. 

(3) A man may always marry the daughter of his mother’s brother. 

According to (2), we may write m(M,) for the type of marriage allowed a man 
whose parents’ marriage was of type M,, and w(M,) for the type allowed a woman 
{ Appendix by A. Weil, in ‘‘Les structures élémentaires de la parenté,"” hy C. Lévi-Strauss. 


282 Groups and permutations Ch. 10, See. 5 


of similar extraction. Condition (8) may be expressed as w(m(M;)) = m(w(M,)). 
We may assume that the functions m and w are permutations of the set Mi, 
M,, ... M, (why?); thus, m and w are commuting elements of the permutation 
group on 7 objects. What can one say about the group generated by m and w? 
Discuss the special case where m and w generate a cyclic group. Classify, in this 
case, the possible types of marriages. Is it ever allowed, in such a society, that a 
man marry the daughter of his father’s sister? 


5. Transformation groups; Sylow's theorems 


Let G be a group and a an element of G. The mapping G > G that sends an 
arbitrary element = into az is called left translation by a. Denoting the mapping 
by L,, we have then L,(z) = ax. L, is a one-to-one mapping, as is easily verified, 
hence is a permutation of the elements of G. In a similar way we define right 
translation by the element a to be the mapping R,: G > G defined by R.(x) = xa. 
It is also one-to-one. Observe that L, and R, are not homomorphisms of G unless 
aig the identity of G. Given a, 6 in G, we have 


Lala(z) = Lo(Lo(z)) = Le(br) = abs = Lao(x) 
Hence 
5a Lake = Lav 
Similarly, 

RoR. = Ry 


If we denote by S(G) the group of all permutations of the set G (see Sec. 3), then 
each left translation L. is an element of S(G), and (5.1) says that the mapping 
G — S(G) defined by « — L, is a homomorphism. Since L, is clearly not the 
identity mapping of G unless a = e, it follows that 2 > 1, maps G isomorphieally 
onto a subgroup of S(G).f In this way G can be thought of as a subgroup of the 
permutation group S(G). In the older literature on groups it was customary to 
view all groups as groups of permutations. 


It is useful to formulate a definition which includes the foregoing considerations. 


DEFINITION 5.1 Given a group G and a set E, G is said to operate on E as a group of 
transformations ¢f to each a in G there is assigned a permutation T, of E such that 
T.Ts = To for any a, 6 in Gt 

In other words, it is merely required that there be given a homomorphism G > 
{ Similar considerations hold for right translations, except that a — 2, is an “antihomo- 
morphism,” since it sends ab into RiR., not Rake. 


t More precisely, G is said to operate “irom the left.” If T»T. = Tan, then G is said to 
operate from the right. 


Transformation groups; Sylow's theorems 283 


S(E), where 3(£) is the group of one-to-one mappings of E with composition of 
mappings as group operation. Since a homomorphism sends identity into identity 
and inverses into inverses, it follows that T, = identity map of E (where e = 
identity element of G), and T,-. = T,~! = inverse mapping of T,. 

An an immediate example of Definition 5.1, a group G operates on itself by left 
translations, as was pointed out above. If H is a subgroup of G, then left transla- 
tion (by an element of G) sends any left coset of H into another left coset. Hence 
G operates by left translation on the set of left cosets of H. We shall make use of 
this fact presently. 

As a second example, if F is any set and if S(E) is the group of permutations of 
E, then clearly S(E) operates on EF pursuant to Definition 5.1. Any subgroup of 
S(E) also operates on E. 

The group of permutations of the integers 1, . . . , 2 operates on the polynomial 
ring Z[z, . . . , 2] in the manner described in the proof of Theorem 3.5. 


Returning to Definition 5.1, instead of writing T(z) for the result of 7, applied 
to an element x in E, it is easier just to write ax. In general this symbol does not 
denote a group product, since a is in G and xin E. In the case of left translations, 
however, Eis the same as G, and az is then indeed the product in G. In the 
abbreviated notation, the requirement T,T; = To» becomes simply 


6.2 a(bz) = (ab)x a,binG,zinE 


For the special case of left translations in G, this is just the associative law in G. 
‘As was pointed out above, T,, is the identity mapping of E, where e = identity 
of G. In the new notation this is simply 


53 ex=xz for allzin E 


We now embark upon some considerations similar to those involved in connec- 
tion with Theorem 4.2. 

We observe first that if @ operates on E according to Definition 5.t, then any 
subgroup of G also operates on E as a transformation group, plainly. 


DEFINITION 5.2. Let G operate as a group of transformations on a sel E. Let x be an 
element of E. The set of all elements ax (a an arbitrary element of G) is called the 
G-orbit of x, denoted by Gr. 

If H is a subgroup of G, then H also operates on E, as pointed out above, and 
therefore the H-orbit Hz of x is also defined by Definition 5.2. It consists of all bz, 
b here running through all elements of H, rather than G. 


EXAMPLE1 Lets be a permutation of a finite set B. The group of all permutations 
S(E) is then finite, and consequently the sequence ¢, s, s%, s°, ete., must constitute 
a eyclic subgroup (s) of S(E) (ef. Exercise 4, Sec. 2). Now S(E) operates on E, 
and so does the cyclic subgroup (s), For z in E, the (s)-orbit, as just defined, is 
none other than the s-orbit of « as defined in Sec. 8. 


284 Groups and permutations Ch. 10, See. 5 


EXAMPLE 2 Let G operate on itself by left translations, and let H be a subgroup 
of G, # an element of G. Then the H-orbit of a consists of all elements az (a in H) 
and is precisely the coset Hz. 


THEOREM $1 Le! a group G operate on a set B. Then every element of Eis in one 
and only one G-orbit. 

Proof. Clearly x is in at least one G-orbit, namely Gz, for @ contains 
the identity e, and ex = z, by (5.3). Now suppose that two orbits Gy 
and Gz have an element x in common, Then, by definition of orbit, there 
are group elements a, b such that z = ay and x = bz. Therefore y = 
avz = a7(bz) = (a7"b)z, by (8.2). Hence cy = (ca—b)z for any ¢ in G, 
again using (5.2). This shows that Gy is a subset of Gz. By the same 
reasoning, Gz is a subset of Gy, whence Gy = Gz, Hence, if two orbits 
have a common element, then they are identical. @.E.D. 


This theorem includes the last part of the corollary of Theorem 4.1 (with right 
cosets in place of left cosets), as follows from Example 2 above. It also essentially 
includes Theorem 3.3, by the remark of Example 1. 


DEFINITION 5.3 Let G operate on a set E, and let x be an element of E. All elements 
ain G such that ax = x form a subgroup of G, ealled the stabilizer, or isotropy sub- 
group, of x. We denote it by G,. 


THEOREM 5.2 Lela group G operate on a set E. If two elements x and y of Eare in 
the same G-orbit, then their stabilizers G, and G, are conjugate subgroups. That is, 
there is an element ¢ in G such that G, = cG.c“!. In fact, there must be an element 
c in G such that y = ex, and for any such ¢ we have G, = Ge“. In particular, 
G, and G, have the same number of elements. 
Proof. From Theorem 5.1 it follows that y is in the orbit Gx (since x 
is there), and so y = cx for some c in G. Now a group element b is in 
G, if by = y, that is, if bee = ex. This holds if and only if e'ber = x. 
Therefore, 6 is in G, if and only if cbc is in G,, It follows easily that 
G, = cGy, or G, = oG,c~, and that the correspondence 6 ~ ¢~'be is a 
one-to-one mapping, in fact an isomorphism, from Gy, to Gz. QE.D. 


Now let us suppose that both G and £ are finite. Consider an orbit Gz in E. 
If we assign to any group element a the point az in the orbit, we obtain a mapping 
G— Gx, Now ax = br if and only if ber = 2, that is, if and only if —'a is in 
the stabilizer G, of z. From the corollary of Theorem 4.1, that is true if and only 
if the left cosets aG, and bG, are identical. Therefore our mapping G — Gz estab- 
lishes a one-to-one correspondence between the left cosets of G, in G and the ele- 
ments in Gz. Hence, the number of elements in the orbit Gz is the same as the 
number of left cosets of G,, and that number, we recall, is the index of G, in G. 
Applying Theorem 4.2, we obtain the following conclusion: 


Transformation groups; Sylow's theorems 285 


THeorem 5.3 Let G operate on a set BE, with both G and E finite. Then, for the 
number of elements in any orbit Gx, 

#(Gz) = #6/#G, 
where G, is the stabilizer of x. 

Again consider some G-orbit in E; we callit 17. To each point xin M is attached 
the stabilizer G., a subgroup of G. Now it may happen that different points of AZ 
have the same stabilizer. This is the case, for example, if G is abelian, for then all 
the stabilizers of elements of 47, being conjugate, must be identical. We look into 
this question briefly. 

Let us collect together into bunches all the elements of the orbit M having the 
same stabilizer. In this way we divide M into a certain number of subsets U1, 
.++.U,, two elements z, y of M being in the same U; if and only if they have the 
same stabilizer. We want to show that all the U,; have the same number of ele- 
ments. Select an x in U,, and suppose that (/, has s elements. Since MJ = Gr, 
we can write out the elements of U, in the form 


Exy By Coy sw sy Co 


where ¢:, . . . ,¢, are all in G. From Theorem 5.2 we have G,. = ¢G:ey.. By 
assumption, z and ¢;x have the same stabilizer, whence 


BS 2 = ¢,G.t;1 (7 =2,...,8) 


Now let y be an element in some other U;, say Uz. Since it isin M = Gz, we 
have y = ax for some a in G. Consider the elements 


56 y= On, att, . . . , ater 


By Theorem 5.2, the stabilizer of ac,x is 


Gacje = (ae;)Gz(ae;)| = ae Gre; 
By (5.5), this is 

Gece = aG07 
But by Theorem 5.2 again, 

Gy = Gar = aGia 


Therefore all the elements (5.6) have the same stabilizer, hence are all in U, along 
with y. The elements (5.6) are all distinct, assuming that the elements (5.4) are 
distinct. Therefore, Uz must contain at least s elements. Since Uj, Us were 
arbitrarily selected U's, it follows that they all contain s elements. Furthermore, 
referring to (5.6), any U, can be mapped into any other U; by applying a suitable 
operation of G. Observe that the number of elements in M is the sum of the 
number of elements in each U/;, hence is equal to rs. Finally, if cG,e“! is any 


286 Groups and permutations Ch. 10, See. 5 


conjugate of G,, then by Theorem 5.2 it is the stabilizer of the element ex in the 
orbit Mf. Consequently, 7 is the number of distinct conjugates of G, (including 
G;, naturally). 


THEOREM 5.4 Let G be a finite group operating on a finite set E, and let M be any 
G-orbit in E. Let ¢ denote the number of different subgroups appearing among the 
stabilizers of the elements of M. Then r is the number of distinct conjugates of any 
one of those subgroups. Each of the subgroups is the stabilizer of a certain number 8 
of elements in M, the same number 8 for cach of the r subgroups, and #M = rs. 

We now give an application of the notions developed above. The following 
theorem is very important in the study of finite groups. 


THEOREM 5.5 (Sylow’s three theorems) Le! G be a group of finite order n. Let p 

be a prime, and suppose that p* divides n (a a positive integer). Then (1) G contains 

a subgroup of order p°; (2) if p* is the highest power of p which divides n, then all the 

subgroups of order p* are conjugate; (3) the number of them is congruent to 1 modulo 
p, and it divides n/p% 

Proof. Let E denote the set whose elements are the various subsets 

A, B, ete., of G containing each exactly p° elements. The number of 


such sets is the binomial coefficient G.) . That is, E has Ge) elements. 


We require a simple fact about this number, Let p* be the largest power 
of p which divides n = 4G, so that 


n= pa B2a 
q being an integer not divisible by p. We claim that 
87 #E is divisible by p®—* but not by pi-e+1, 


For 


n n(n — 1) (pt +2) _ pa "peg — 
B= = a = Pe eee 
‘ ¢.) pp —1)s-- i we TE 


It is quickly seen that pg — k and p* — k are divisible by exactly the 
same power of pfork =1,..., p? —1. Therefore the prime factori- 
zation of #E contains exactly the power pf-* (see Exercise 5, Sec. 5, 
Chap. 3). 

Now let A be an element of E, hence a subset of @ consisting of p* 
elements. If b is any element of G, then 6A also has p* elements and 
therefore is also an element of B. In this way G operates on E according 
to Definition 5.1. Consider the stabilizer G4 of some element A of E. 
That is, Gs consists of all 6 in G such that bA = A. We claim that 


Transformation groups; Sylow’s theorems 287 


58 


59 


$16 


San 


$13 


#Ga < p* 


For let ai, a2, . . . , Gya be the elements constituting A. If 6 is in 
Ga, then ba, must appear in the list, say ba; = a,. But then b = a,.a)7}, 
showing that there are at most p* possibilities for b. 

By Theorem 5.1, the operation of G in E decomposes the latter into a 
certain number of disjoint orbits, say 47, M’, M”, ete. Then 


#E = (4M) + GEM) + GM") + ete. 

From (5.7) it follows that not all the numbers on the right ean be divisible 
by p®-*+!, Suppose that to be the case for the orbit @. Thus 

#M is not divisible by p*-=”, 

We henceforth deal only with this orbit. Let A be one of its elements, so 
that M = GA, and let G, be its stabilizer. By Theorem 5.3 we have 

#M = 4G/Ga = PPq/#Ga 

From (5.9) we see that #G4 must contain at least the power p*. Combin- 
ing this with (5.8), we conclude that 

#G4 = pr 


This proves Sylow’s first theorem. 
We now suppose that a = 8, that is, that p* is the greatest power of 
p whieh divides x. Then from (5.9) we have now 


£M is nol divisible by p. 


Let H be any subgroup of G of order p*. Now G operates in the orbit 
MM itself, clearly, and therefore so does H. Hence H decomposes M into 
certain H-orbits W, W’, W”, ete. Since 


aM = (HW) + EW) + (EW) + ete. 


it follows from (5.11) that at least one of these H-orbits contains a num- 
ber of elements not divisible by p, say the orbit W. Select an element 
in W, say B, so that W = HB, and let Ha be its stabilizer. By (5.4) 
again we have 


#W = #H/#He = p°/tHe 


But #¢W is not divisible by p, and therefore the only possibility is that 
#H» = p*. In other words, H =H». But from (8.10), with B in place 
of A, we have #Gp = p*, from which it follows that H = Gg. Thus 


Every subgroup of order p* (maximum a) is the G-stabilizer of an element 
in the orbit M. 


288 Groups and permutations Ch. 10, See. 5 


It follows from Theorem 5.2 that al] these subgroups (they are called 
Sylow subgroups) must be conjugate. This proves Sylow’s second theorem. 

Finally, let Hi, Ho, . .. , H; be all the different Sylow subgroups 
(for the given prime p), that is, all the subgroups of order p°, From 
(5.10) and (5.13), they are precisely the distinct stabilizers of the ele- 
ments in M. Fix attention on one of the Hi, and call it simply H. It is 
the stabilizer of a certain number of elements in M, say Ay... , 
Each of these elements forms by itself an H-orbit in M. By (5.12) it 
follows that any other H-orbit contains a number of elements divisible 
by p. Now #M is the sum of the number of elements in the various H- 
orbits, hence is equal to s plus a multiple of p. That is, 


revi #M = s (mod p) 
By Theorem 5.4 we have #M = rs, ao that 
5.5 8 = rs (mod p) 


From (5.11) and (5.14) we have s # 0 (mod p). Therefore, from (5.15), 
it follows that 


516 r= 1 (mod p) 


From above, #M = rs. From (5.10) and the preceding equation, #M = 
n/p?. Hence, r divides n/p*. This completes the proof of Sylow’s third 
theorem. Q.E.D. 


EXAMPLE 3 Let G be a group of order 42 = 2-3-7. By Sylow’s first theorem, 
G contains a subgroup H of order 7. We claim it is a normal subgroup. By 
Sylow’s third theorem, the number r of distinct conjugates aHa-! of H divides 
42/1 = 6 and is congruent to 1 (mod 7). The only possibility is r = 1. There- 
fore H is equal to all its conjugates, hence is normal. The factor group G/H has 
order 6, and the same argument can be repeated for it. 

The following theorem gives an example of an application of Sylow’s theorems. 


THEOREM 8&6 (Sylow) Jf G is a group of order p*(p a prime), then every subgroup 
of order p*"' is a normal subgroup, and there is at least one such subgroup. 

Proof. By Sylow's first theorem, there is at least one subgroup H of 
order p*—'. Now let E denote the set of left cosets of H inG. By Theo- 
rem 4.2, E contains p elements. Now H operates on E by left transla- 
tions. Thus, if a is an element of H (or G), and if bH is an element of E, 
then abH is a new element of E, and thus a produces a permutation of E. 
(It is easily seen that Definition 5.1 is satisfied.) By Theorem 5.1, the 
operation of H on E decomposes E into a certain number of disjoint 
H-orbits, say Mi, Me, ...,M, We have then 


Transformation groups; Sylow’s theorems 289 


$a? 


p=#Bb = (Mi) +--+ + GM) 


Now one of the orbits contains the coset H, say the orbit M,. It is clear 
that Mf, must then consist of just that one coset H. It follows from 
(5.17) that r > 1. 

Consider now any coset aH, and let MM, be the orbit of E that contains 
it, Thus M; consists of all cosets caH, where c is in H. The stabilizer 
of ¢H therefore consists of all ¢ in H such that caH = aH. That is so 
if and only if ea = az for some z in H, or ¢ = axa. Hence, the stabi- 
lizer of aH is the subgroup K consisting of all elements common to both 
Hand aHa™. 

Now any subgroup of H, in particular the stabilizer K, must contain 
p elements for some 8, by Theorem 4.2. By Theorem 5.3, #M; = 
#H/#K = p*/p*. From (5.17) and the fact that r > 1 we conclude 
that #M, = 1, for otherwise it would have to be at least p. Hence K 
contains the same number of elements as H, namely, p*. Since aa! 
also contains p* elements, it follows that H = a@Ha—. Therefore H is 
the same as all its conjugates and consequently is normal. Q.E.D. 


THEOREM 5.7 Let p and q be primes with p > q and p #1 (mod q). Then any 


group of order pq is abelian. 


Proof. Let G have order pg. We want to show first that G contains 
two normal subgroups H, K of order p and q, respectively. 

Case 1. pq. By Sylow’s first theorem G contains subgroups H, 
K of orders p, q, respectively. By Sylow’s third theorem the number of 
conjugates of H must divide g and must be congruent to 1 (mod p). 
Since q < 2, it follows that H must coincide with all its conjugates and is 
therefore normal. Similarly, the number of conjugates of K must divide 
p and must be congruent to 1 (mod q). By assumption, p # 1 (mod q), 
and therefore there is only one conjugate, K itself. Hence K is normal. 

Case 2. p = q. By Theorem 5.6, G contains a subgroup H of order 
p, and any such subgroup is normal. Take an element ¢ not in H. Then 
¢ has order p or p? (Theorem 4.2). In the latter case G is just the cyclie 
subgroup (¢) generated by ¢, hence is abelian, We therefore assume that 
¢ has order p, It then generates a eyclic subgroup K containing p ele- 
ments. K is a normal subgroup of G, by Theorem 5.6. 

Thus in either case we have a subgroup H of order p and a subgroup 
K of order g, both of them normal. Furthermore, they have only tke iden- 
lity element in common. For the set of elements common to H and K is a 
subgroup L of both H and K. Its order must divide both p and q (Theo- 
rem 4.2). In case 1 (p = q) it follows that L must have order 1; henee 
L =e), The same conclusion holds for Case 2 (p = ¢), for otherwise 


290 Groups and permutations Ch. 10, See. § 


L would have order p and would therefore have to coincide with both 
H and K, contradicting the choice of ¢. 

Consider now the left cosets aK of K with a in H. No two of them 
can be the same, for if aK = a’K (a, a’ in H), then aa’ is in K (Theo- 
rem 4,1, Corollary); but a~‘a’ is also in H, since H is a subgroup of G. 
From the remarks above, a-'a' =, or a =a’. Therefore, since there 
are exaetly p left cosets of K in G and since H contains p elements, it 
follows that the left cosets of K appear exactly once among the aK, with 
ain H. Every element of G is in one and only one left coset, and we 
conclude that every element of G can be expressed uniquely in the form 


5.18 ab, with a in H,b in K 


Now choose any ain H, bin K. Since H and K are normal, we have 
bH = Hb and Ka = aK. It follows that ba = a’b for some a’ in H, 
and ba = ab’ for some b’ in K. Thus a’b = ab’, and from the uniqueness 
of the expression (5.18) we conclude that a’ = a, b’ = b, whence ab = ba. 

This shows that elements of H commute with elements of K. Con- 
sider now any two elements of G. They can be expressed in the form 
(5,18), say ab and a,b, We have then (ab)(aibi} = a(bay)by = afarb)br = 
(aa;)(6b,). But both H and K are cyclic, hence abelian (Theorem 4.2, 
Corollary). Hence (ab)(ayb:) = (a,a)(b\b) = (arbi) (ab), showing that G is 
commutative, Q.E.D. 


For example, any group of order 4 must be abelian, or any group of order 15, 
ete. Groups of order 6 violate the condition p # 1 (mod g). The symmetric 
group S, on three elements has order 6 and is not abelian. In general, if p 
(mod g), then there are both abelian and non-abelian groups of order pq. 

The two theorems above serve to illustrate some applications of Theorems 5.3 
and 5.5 to the study of finite groups. There are a number of other important 
theorems closely related to the foregoing, but we shall not go into them here. 


EXERCISES 

1, Prove that any group of order 281 contains normal subgroups of orders 7 
and 11. 

2, Let G operate on a set B, and let M be a G-orbit. Let H be the stabilizer 
of an element in M. Let N be the normalizer of H (see Exercise 4, Sec. 4). Prove 
that the number of elements in M having H as stabilizer is equal to the index of 
H in N. Prove that the number of different subgroups appearing among the 
stabilizers of elements of M is equal to the index of N in G. Deduce from these 
facts 2 proof of Theorem 5.4. 

3, Let a group G operate on a set E, both finite, and let H be a normal sub- 
group of G. Let Mf be a G-orbit in B. Prove that the H-orbits in M all contain 
the same number of elements, [Hint: Use Theorem 5.8.] 


The Jordan-Hilder theorem 291 


4. A group G is said to operate transitively on a set E if for every pair x, y in 
E there is an element a in G such that ax = y. Suppose now that G and E are 
finite and that @ has at least as many elements as E. Assume further that there 
is some x in E which is not left fixed by any ain G other than the identity. Prove 
that G operates transitively on E. 

5. Let S(Z) be the group of permutations of a set E. Prove that the stabilizers 
in S(E) of any two elements of E are conjugate subgroups. 

6. Let B = Qla, . .. , «,| denote the ring of polynomials in n independent 
variables x1, . .., “, over the rational field. Let S denote the group of per- 
mutations of the set {z:, ..., %,}. We make S operate on B as follows: If 
s is an element of S and f(m, .. . , 2.) any element of B, then we define sf(21, 

+ 1 tn) to be the polynomial f(sm, ... , sv,), where sz, denotes the result of 
applying the permutation s to x. Show that each s determines in this way a ring 
isomorphism B — B, and show that the requirements of Definition 5.1 are satis- 
fied. Taking x = 4, what are the stabilizers of the elements x, — x, 2%, — x, 
ay + 2 +23, T2224? Describe the S-orbits of B, for arbitrary n. Which ele- 
ments of B are left fixed by all elements of S? 

7. Let p, ¢ be primes, with p > g and p # +1 (mod gq). Prove that any group 
of order pg must be abelian. 

8. Give conditions on two primes p, q such that any group of order pq’ is abelian. 
Prove your contention. 


6. The Jordan-Holder theorem 


In order to understand the structure of a given group G there are two main de- 
viees available: 

(1) Determining the subgroups of G, in the hope that some of them will turn 
out to be more or less familiar, for example, cyclic groups (any group contains 
some cyclic subgroups, by Theorem 2.2). In this way it is often possible to split 
G into simpler pieces. The subgroups of interest in this connection are the normal 
subgroups of G. 

(2) Determining homomorphisms f: G — G’, where G’ is a group with which 
one feels at ease. For this purpose it is fruitful to take for G’ the group of auto- 
morphisms (ie., one-to-one linear mappings) of a vector space. The image f(G) 
is a subgroup of G’ which reflects some of the structure of G. As we have seen, 
a homomorphism f gives rise to @ normal subgroup of G, namely, the kernel of f, 
and therefore this method is not entirely separated from that of (1) above. 

The following definition singles out those groups which cannot be decomposed 
conveniently into simpler parts. 


DEFINITION 6.1 A group G ts said to be simple #f its only normal subgroups are the 
group G itself and the subgroup (e) consisting of the identity element alone. 


292 Groups and permutations Ch. 10, Sec. 6 


Thus, if G is simple and if f: G + G’ is a homomorphism, then the kernel of 
f is either G or (e), Hence f either maps all of G into the identity of G’, or else 
f maps G isomorphically onto a subgroup of G’. 


ExampLe 1 Let G be abelian, so that every subgroup is normal. Let a be an 
element of G,a * e. Then the cyclic subgroup (a) generated by a is a normal suh- 
group. Hence, if Gis simple, then (a) = G, and therefore G is cyclic. It cannot 
be infinite cyclic (ie., isomorphic to Z+), for such @ group contains infinitely 
many subgroups. Therefore a has finite order m. That is, (a) consists of e, a, 
., a"-', with m > 1, since a ~ e. We claim that m isa prime, For if m = 

hk (h, k integers >1), then 6 = a" has order k, and the cyclic subgroup (6) is not 
equal to either (e) or G, contradicting the assumption that G is simple. It follows 
at once that an abelian group is simple if and only if it is cyclic of prime order. 

We shall see presently that any finite group can be decomposed in a certain 
way into simple groups. 

The following definition is closely connected with method (1) mentioned above, 
and it is analogous to Definition 5.2, Chap. 8. 


verinition 62 Let Hi, ..., H, be subgroups of a group G. Then G is said to 
be their diveet sum (or direct product) under the following conditions: 

(1) If x is in Hy and 2; is in H,, with + # j, then xx; = 2pm. 

(2) Any element 2 of G can be expressed in one and only one way as a producit 


eth +++ 2 
with 2:in HC @ = 1,2,..., 7). 

Observe that each Hi must then be a normal subgroup of G. For from (2) and 
repeated applications of (1) we have Hix = Hix, - ++ x, = Hie - = 
QA ty + + B= = + - tH, = eH 

From (2) it follows that if mx, -- + 2, = e(z; in Hi), then 2; =e for i= 
1 lat 


exampLe 2 Consider the group G with the multiplication table 


| abe 
ele abe 
a/aecb 
b/b c ea 
ele bae 


The elements e, a form a subgroup H; and the elements ¢, b form a subgroup H». 
Every element of G occurs just once in the list ee, eb, ae, ab. Since G is abelian, 
condition (1) holds, and it follows that G is the direct sum of H; and Hz. Both 


{If G is abelian, with additive notation, then naturally we speak of the sum x + 2 + 
+ +2. In this case (1) is superfluous, 


The Jordan-Hilder theorem 293 


H, and Hp are cyclic of order 2, that is, isomorphic to Z,+. The group G is some- 
times called the Klein 4-group. Note that e, ¢ form another cyclic subgroup of 
order 2. 


examece 3 Consider a group G of order pq, where p and q are primes satisfying 
the conditions of Theorem 5.7, From the proof of that theorem [ef. (5.17)] it 
is easily seen that G is the direct sum of the subgroups H, K of orders p, ¢, re- 
spectively, or else G is cyclic of order p*. For example, a group of order 4 must 
be either the direct sum of two subgroups of order 2, hence isomorphic to the 
Klein 4-group of the preceding example, or it must be isomorphic to Z:*. 


Exampte a Let Hy, ..., H, be given groups (multiplicative notation). Let 
G consist of all r-tuples x = (c, .. . , 2), within Hi(i = 1,...,1r). Define 
multiplication in @ by (1 0. 5 Be) = Yas. so Ye) = (Bias Belay oy Bed 
It is easily verified that G with this operation is a group. Furthermore, if H: 
denotes the subset of consisting of all elements (¢,. 5 24...» é) in 
which the jth entry is e; = identity of Hj, for j # i, the ith entry unrestricted, 
then Hi is a subgroup of G, and it is isomorphic to H; in a natural way. G is the 


direct sum of Hi, ..., Hi, (This construction is analogous to that of Exercise 
2, Sec, 3, Chap. 8, concerning vector spaces.) The group G constructed in the 
manner indicated from Hy... , H, is called the direct sum of the Hy. 


If a group G is a direet sum of subgroups H, and H;, then G/H; is isomorphic 
in a natural way to He. For write z in G in the form 2 = xy, with 2) in Hi and 
zy in He, Then 2, * are uniquely determined by x, and the assignment 2 — 2 
defines a homomorphism G — H; whose kernel is H;. The assertion follows from 
Theorem 4.4. 

More generally, let G be the direct sum of subgroups Hy ..., H,. From 
(1) of Definition 5.2 we have H.H; = HjH;, From this and (4.8) it follows that 
HigHig: + + H, is a subgroup of G, call it Gi (f= 0,1,..., 7-2). More 
over G; is the direct sum of Hiya...» Hr and it is a normal subgroup of Gia 
{in fact, of G), the argument being essentially that following Definition 5.2. De- 
noting by G, the identity subgroup (e), we have then 


ba G=H@I4D-+--+ DGi1DG, = (e) 


where the symbol > stands for “contains.” We have Gi = Hifi + + Hy = 
H.G,, by definition, and G;_, is the direct sum of H; and G;. Consequently 


62 Go/G & Hy, Gi/G2 = Hy...» Gra/Gr S A, 


where the symbol = stands for “is isomorphic to.” 

Hence, a decomposition of G into a direct sum gives rise to a descending se- 
quence (5.1) of normal subgroups, and the quotient groups of the successive sub- 
group (5.2) are isomorphic to the H,. 


295 Groups and permutations Ch, 10, See. 6 


In general it is impossible to split a group into a direct sum, but for any finite 
group it is possible to achieve a situation of the type (6.1) for which the quotient 
groups (6.2) are simple groups, in the sense of Definition 6.1. We now look into 
this question. 


DEFINITION 63 Let G be a group and K a normal subgroup. Then K is said to 
be maximal if K = G and if K is not contained in any larger normal subgroup of 
G other than G itself. 


THEOREM 6.1 A normal subgroup K of a group G is maximal if and only if the 
quotient group G/K is simple and contains more than one element. 

Proof. Suppose K is maximal. Then, by definition, K ~ G and so 
G/K contains more than one coset. G/K must be simple. For let 7 
be a normal subgroup of G/K. By Theorem 4.6 there is a normal sub- 
group H of G whieh contains K and which is mapped onto H by the 
canonical homomorphism G—G/K. Then, by Definition 6.3, either 
H = K or H =G. Therefore either H is the identity subgroup of G/K 
or else coincides with G/K. 

Conversely, suppose that G/K is simple and contains more than one 
element, so that K ~ G, Let H be a normal subgroup of G which con- 
tains K. By Theorem 4.5, the canonical homomorphism maps H onto 
anormal subgroup A of G/K, and by Theorem 4.6 we have H = HK/K. 
Since H contains K, it follows that HK = H, and so H = H/K. By 
assumption either H = G/K or else H is the identity subgroup of G/K. 
In the first case we have H/K = G/K, and we claim that H =G. For 
let be an element of G. Then there is an element y in H such that 
aK = yK, and so y~'z is in K, hence in H. Therefore y(y—'z} = z is in 
H. In the second case, H/K is the identity subgroup of G/K, whence 
plainly H = K. Therefore there are no normal subgroups in G between 
KandG. QED. 


THEOREM 6.2 Let K be a normal subgroup of a finite group G, with K = G. Then 
K is contained in at least one maximal normal subgroup. 
Proof. If K ig not already maximal, then there is a larger normal 
subgroup K’ » G containing K. If K’ is not maximal, then there is a 
yet larger normal subgroup K” = G@ containing K’, ete. Since G is 
finite, this process must lead after a finite number of steps to a maximal 
normal subgroup of G which contains K. Q.E.D. 


THEOREM 6.3 Let G be a finite group containing more than one element. Then 
there exists a series of subgroups Gu Gy... , G, such that 

(1) Go = G and G, = (@) = identity subgroup. 

(2) G; is @ mavimal normal subgroup of Gia fori =1,..., 7. 


The Jordan-Héider theorem 295 


Furthermore, if K is any normal subgroup of G, then a series of the above type can be 
found such that K is one of ils members. 

Proof. Take any normal subgroup K = G, possibly the identity sub- 
group (e). By Theorem 6.2, it is contained in a maximal normal sub- 
group G, of G = Go. If G, # K, then again, by Theorem 6.2 applied to 
Gy, there is a maximal normal subgroup G) of G; which contains K. Con- 
tinuing in this way we eventually arrive at K itself. If K # (e), then we 
simply continue on with K in place of G, and we obviously obtain a 
series of the required sort. Q.E.D. 


Remark. The proofs of Theorems 6.2 and 6.8 contain ill-concealed induction 
arguments. 


DEFINITION 6.4 A series of subgroups satisfying conditions (1) and (2) of Theorem 
6.8 is called a composition series for G. The factor groups H; = Gis/G; are called 
the factors of composition of G. They are simple groups, by Theorem 6.1. 

From the proof of Theorem 6.3 it is clear that a composition series for G is in 
general far from unique. However, any two composition series are intimately re- 
lated, and that is the subject of the Jordan-Hélder theorem below. It is an easy 
consequence of the following theorem: 


tHeonem 6.4 Let G, and Gj be two distinct maximal normal subgroups of a group 
G, and let K be the set of elements common to both of them. Then K is a maximal 
normal subgroup of G, and Gi. Furthermore the groups G;G{ and Gi:K are iso- 
morphic in a natural way, and similarly for the groups G:G; and GUK. 

Proof. Since G, and G{ are normal, we have G,G{ = G{G, and there- 
fore GyG{ is a subgroup of G, by (4.3). It is a normal subgroup, for 
aGiGi) = (aG)G{ = (Gia)Gi = Gi(eG}) = (GiGie for any a in G. But 
G,G{ contains both G, and Gj. Since G, # G{, we have G,Gi # G,. Since 
G, is maximal, it follows that GiGi = G. 

Now by Theorem 4.6, the groups G,/G,/) Gi and G,Gi, G4 are iso- 
morphic in a natural way. That is, G)/K and G/Gj are isomorphic in a 
natural way. But G/Gj is simple, by Theorem 6.1, and therefore so is 
GK. Consequently, K is a maximal norma] subgroup of G,, again by 
Theorem 6.1. A similar argument applies to Gj/K and G/G, Q.E.D. 


THEOREM 6.5 (Jordan-Hélder) Let G be a finite group containing more than one 

element, and let Go, Gy, . . . , G, and Gi, Gi, . . . , Gi be two composition series 

for G. Then r = s, and the factors of composition G,_1'Gi of the first series are iso~ 
morphic, in some order, to the factors of composition of the second series. 

Proof. We prove this by induction on the order x of G. For the 

smallest admissible value of n, namely, 2, the statement is true. For 

any group G of order 2 is a simple group, being cyclic of prime order. 


296 


6.3 


66 


6.10 


Groups and permutations Ch. 10, See. 6 


A composition series for a simple group can consist only of the two terms 
G and (e). 

Suppose then that the theorem holds for all groups of order <n, and 
let G have order n. Let K be the set of elements common to G, and 
Gi, Either K = (e), or by Theorem 6.3 we can find a composition series 
K,K', K", ... , for K, the last term of which is (e). It follows at once 
from Theorem 6.4 that the series 


Gy KL KK", 2. (e) 
and 
Gi, K, K', K", ...  (@) 


are composition series for G,, Gj, respectively. By assumption, 


Gi, Gy ss Ge 
and 
Gi, Gi... GE 


are also composition series for G,, Gi, respectively. By induction as- 
sumption, the number of terms in (6.3) must be the same as the number 
of terms in (6.5). Similarly for (6.4) and (6.6). Hencer = s. By induc- 
tion assumption again, the factors of composition 


G/K, K/K', K'/K", ete. 


obtained from (6.3) must be isomorphic, in some order, to those obtained 
from (6.5): 


G1 /Gz, Ga/Gs, Gs/Gr, ete. 

Similarly for 

Gi/K, K/K', K'/K", ete. 

Gi/G4, GE/G4, G4/GI, ete. 

But G/G, is isomorphie to Gi/K, by Theorem 6.4; and G/G{ is isomorphic 
to G/K. Hence, the groups 


G/Gi, Gi/Gs, Ga/Gs, ete. 
must be isomorphic, in some order, to 
G/Gi, Gi/G4, GL/G4, ete. 


This completes the proof. 


It is possible to give a rather more refined analysis of composition series, but 
we shall not pursue the matter further except in Sec. 7 for abelian groups. 


The Jordan-Halder theorem 297 


The Jordan-Hélder theorem shows that with any finite group G we ean asso- 
ciate, by means of any composition series, a sequence of simple groups Hi, Hz, 
.. +, H» These groups, apart from the order in which they appear, are uniquely 
determined by G up to isomorphism. In this sense they do not depend upon the 
particular composition series. The knowledge of these groups gives a considerable 
amount of information about the structure of G, but not enough to reconstruct 
G from the Hj, in general. Observe that if we construct the direct sum G’ of 
Hy, ..., H, according to Example 4, then G’ will have the same factors of 
composition as G, as follows easily from the discussion concerning (6.1) and 
(6.2). However G’ may not be isomorphic to G. 


DEFINITION 65 A finite group G is called solvable, or metacyelic, if the factors of 
composition of G are all abelian. 

The factors of composition, being simple, must then be cyelic of prime order 
(see Example 1 above). 

The notion introduced in the foregoing definition is of great importance. By 
means of Galois theory it is possible to attach to any irreducible polynomial 


ae" ae + + + aye +4, 


with coefficients in the rational field Q, a certain finite group G, called the Galois 
group of the polynomial. Roughly speaking, G describes the possible permutations 
of the set of roots of the polynomial in the complex field C. Galois theory shows 
that the roots of the polynomial can be obtained from the coefficients a; by ordinary 
field operations (+, -, X, +) and root extractions if and only if the Galois group 
Gis solvable. For polynomials of degree >4, that is not usually the case. In this 
way it can be shown that there is no “formula,” analogous to the quadratic for- 
mula, for equations of degree >4 (cf. Sec. 6, Chap. 6). 


THEOREM 6.6 Any group of prime power order p* is solvable. 
Proof. This follows at once from Theorem 5.6. 


THEOREM 6.7 The symmetric group S,, of all permutations of n objects is not solvable 
ifn > 4. 

We omit the proof of this theorem, referring to the books cited in the Intro- 
duction for details. This theorem is closely connected with the reference to Galois 
theory above. For the Galois group of ‘‘most” polynomials of degree n is pre- 
cisely S,,. 


EXERCISES 


1. Find a composition series for the symmetric group S5. 
2, Find a composition series for the symmetric group Ss. 
3, Let f be a homomorphism of a group G onto a group G’, and let [Go G, 


} With minor changes, @ can be replaced here by any field. 


298 Groups and permutations Ch. 10, See. 7 


. . «5 G,} be a composition series for G. Analyze the series of subgroups {(G;) 
in G. 
4, Let @ be the direct sum of cyclic subgroups H, K of orders m, n, respectively. 
If m, ” are relatively prime, prove that G is cyclic. 
5. Prove that any group of order 4 must be isomorphic either to Z,+ or to the 
Klein 4-group (i.e., the direct sum of two copies of Zo+). 
6, Let Ki, Ke, .. . , K, be distinet normal subgroups of a finite group G, with 
K, DK, > +--+ 2K, Prove that there is a composition series for G which 
includes the K; among its terms. 
7. Let G be a group of order pq, where p and g are distinct primes. Prove 
that G is cyclic. 
8. Prove that the order of each factor of composition of a finite group G must 
divide the order of G. 
9. Determine the maximal subgroups of Zt. 
10. Let G be the direct sum of subgroups Ai, . . . , H, of orders m, . . . , tn 
respectively. Prove that G has order min: - - - 7. 
11. Let G be an abelian group, and let H, K be subgroups. Prove that G is their 
direct sum if and only if H\ K = (e) and HK = G, 
12, Let a group G bea direct sum of subgroups H,,...,H, Fori=1,..., 
r let f; be a homomorphism from H; to an abelian group G’. Prove that there is a 
unique homomorphism f: G — G’ such that f(z) = f(x) whenever x is in Hi 
G@=4...,r. 


7. Finite abelian groups 


We close this introduction to group theory by showing that finite abelian groups 
can be split into direct sums of cyelic groups. This represents a considerable re- 
finement over the mere existence of a composition series, as guaranteed by Theorem 
6.3. We shall continue here with the multiplicative notation. 

Let @ be an abelian group of order 2, and let q be a prime which divides. Then 


73 G contains a subgroup of order 9. 


This follows from Sylow’s theorem (Theorem 5.5) but can be proved more simply 


as follows: Let 1, ¢2, . . . ,¢, be all the elements in G, and let , be the order of ¢,. 
That is, n, is the least integer such that ¢* = e, Consider the expression 
ae ertleg seat Osk <x ford =1,...,% 


It is easily seen, using the fact that G is abelian, that (7.2) represents every element 
of G (for suitable choices of the k:) as many times as it represents the identity. 
There are nin; + + + ny choices of the n-tuple (hh, . . . , Kn), and so that number 
must be divisible by m, hence also by g. Then ¢ must divide some x,, and the 


Finite abelian groups 299 


element c’ = ¢;"*4 has order g. Hence the cyclic subgroup (c’) of G has order g. 
G being an abelian group of order n, let 


Be Ss BM De 


be the prime factorization of » (distinct p,). For brevity set 


1 ma = pe 
Define 
75 Hy, = set of ail elements x in G such thai z™ = e. 


We observe first of all that 4; is a subgroup of G. For if a, b are in H;, then 
(aby = ab"! = e, since G ig abelian, and so ab is in H,. Clearly a“, b-! are 
also in Hi. 


ne The order of H, is a power of pi 


For otherwise the order of H; would be divisible by some other prime g. By 
(7.1), H; would contain a subgroup K of order q. Let 6 be an element of K different. 
frome. We have b™ = e and b? = ¢ (Theorem 4.2). Since m, and q are relatively 
prime, there are integers such that mk + gh = 1. Then b = bmitte = pmithth = ¢, 
a contradiction. 


THEOREM 7.1 G is the direct sum of the subgroups Hh, . . . , H, of (7.3). That is, 
every element a in G can be written in one and only one way in the forma = qyaz ++ + Gy 
with a; in Hy. 

Proof. Suppose first that aia «+ + a; 


ibe + + + 8,, with ay and 6; in 


Hi. Since G is abelian we have aib:-! = (@2-'by) . . - (a-'b,). The left- 
hand side is an element of H;. The right-hand side is an element of 
Hz +++ i, Raise both sides to the power n/m = mz-+ + m. On the 


right we get the identity, since (@,b,)"" =e, Hence (qb“)"™-™ = e. 
But by (7.6) and Theorem 4.2, the order of any element of Hy is a power of 
pi. Thus, if x 4 ¢in Hi andif zt = ¢, then p: must dividek. Since p: does 


not divide nm +--+ m,, it follows that aby? = ¢, or a1 =. Similarly, 
a; =}; for f cet 

Now put k; = n/m; The &; are relatively prime, clearly, and therefore 
there exist integers A; such that Aiki +--+ ~ + Ak, = 1, For any a in 
G set 
a=a%% G=1,...,7) 


Then a = abitios = gin = (a")** = e, by Theorem 4.2. Therefore a; 
isin H; Furthermore, 


Gad, + + ay = aL a girly oe ghhtitesthrlr = og QED, 


200 Groups and permutations Ch. 10, See. 7 


coroutary 1 The subgroup H; has order m; = pi. 
Proof. From (7.6), Hi has some order p:*. Then iH: - » + H, con- 
tains p+. - p,@ elements, clearly. From Theorem 7.1 and (7.3) 
it follows that 8; = a;. Q.E.D. 


conouary 2 If G is decomposed in any way into a direct sum of subgroups Hi, 
. . « , Hi whose orders are powers of distinct primes, then that decomposition is the 
same as the decomposition of Theorem 7.1, apart from the enumeration. 

Proof. Let Hi have order qi. By Theorem 4.2, qt must divide n, 
and so g; must be the same as one of the pj. We can renumber things so 
thatg, =p; (@ = 1, ... ,8). It follows from the direct-sum assumption 
that the order of G is equal to p"'- - - p#. Henee, s =r and 6; = a. 
Then for any a in Hi we have a”! = e, and so H! is contained in H;, by 
definition of H;. Therefore H; = Hi, by Corollary 1. @.E.D. 


To obtain a further decomposition of G we must now investigate groups of 
prime-power order. 


THEOREM 7.2 Let H be an abelian group of order p* (pa prime). Then H is the direct 
sum of a certain number of cyclic subgroups. 
Proof. We use induction ona. Fora = 1 the assertion is clearly true, 
for then H is cyclic of order p and contains no subgroups other than itself 
and (e). 
Now given a > 1, we suppose that the theorem holds for all groups of 
order p’, with y <a. By Theorem 4.2, the order of every element of 
H is a power of p. Let a be an element of maximum order p®, and let 
K = (a) be the eyclie subgroup generated by a. If 8 = a, then K =H, 
and so H itself is cyclic. Suppose then that 8 <a. The quotient group 
H/K has order p*-*. By induction hypothesis, H/K is the direct sum 
of certain cyclic subgroups A,,..., A,. Let the order of A, be py, 
and let H; be generated by the coset ¢;K. Then p” is the lowest power of 
¢; which is in K. To simplify notation, write c for ¢; and y for y Then 
c?” ig in K, hence is a power of a, say 


=a 
Now ¢?* = e, by definition of 8, and so 

cf = (ery = (anh = a =e 
Therefore sp*-7 must be a multiple of p*, say 
spi-1 = kp? 

or 


8 = kpr 


Finite abelian groups 301 
Set 
b = ca 


then 


be = aw” = ata 


Restoring the index i, for each ¢; we have produced an element 6, in 
the same coset of K and such that 5; has order precisely p. Let H; = 
(b:) denote the cyclic subgroup of H generated by b:, and consider the 
subgroup 


H’ = KH\H, ++ - A, 


of H. Let mom +++ ay and yoy: - + - y, be two of its elements, with 
2, vo in K and x, yin Hix =1,..., 7). If they are equal, then 
aoyot = (x1) + + + (wy). The canonical homomorphism f: H > 


H/K carries the left-hand side into the identity of H/K; and it carries 
each element 2;-'y, into an element of H;. Since H/K is the direct sum 
of the 7, it follows that f must map each x;y, into the identity of H/K. 
In other words, each x,y; is in K. But the only element in both K and 
Hi; is e (for the lowest power of b; in K is ,°"' = e). Hence x; = y;. It 
follows that H’ contains precisely p!+ pn. - - pt = p®pt-# = p* ole- 
ments. Hence H’ = H, and therefore we have H expressed as a direct 
sum of cyclic subgroups. .E.D. 


To establish some kind of uniqueness concerning the decomposition of an abelian 
group of order p* into cyclic subgroups, we need the following simple result. 


THEOREM 7.3 Let H be a cyelic group of order pt. Then the elements of order at 
most p? (y < a) in H form a subgroup of order p*. 
Proof. Let H be generated by a. An element a* has (a*)*” = e if and 
only if kp’ is a multiple of p®, or k a multiple of p’-y. There are p7 integers 
in the range 0 < & < p® which are multiples of p?—.  Q.E.D. 


coronary 1 Let H be an abelian group of order p* (p a prime), and let H be de- 

composed in any way into a direct sum of cyclic subgroups Hy, . . . , H, of orders >1. 
Then the nuimber of them 7 and their orders are completely determined by H. 

Proof. Let n, denote the number of elements z in H such that 2°" = e. 

Write « = mz +++ 2, with x in H; Then 2° =e if and only if each 

x;? = e, by the direct-sum assumption. By Theorem 7.3 each H; contains 

p elements such that 2," =e. It follows readily that m = p', showing 

that r depends only on H, In a similar way it is easily seen that if s of 

the H; have order >p*, then nm — m = p™ — Thus s depends only 


302 — Groups and permutations Ch. 10, See. 7 


on H. Continuing in this way one finds that the number of H; of order 
> py can be expressed in terms of the n,, hence is determined by H alone. 
QED. 


coronary 2 Le! H be an abelian group of order p*. Then H is cyclic if and only 
if the number of elements x in H such that x” = ¢ is p. 
This follows at once from Corollary 1, since then r = 1. 


Combining Theorem 7,1 and its Corollary 2 with Theorem 7.2 and its Corol- 
lary 1, we have the following theorem: 


THEOREM 7.4 Any finile abelian group G can be expressed as a direct sum of cyclic 
subgroups of prime power order. The number of the cyclic subgroups in the direct 
sum and their orders are completely determined by G. If G’ is another abelian group 
for which the corresponding set of integers is the same as for G, then G and G! are 
isomorphic. 

The last assertion is easily proved as follows. LetG = Hil) ~~ - H,andG’ = 
HiHs - » - Hi be direct-sum decompositions, the H; and H? being cyclic of prime- 
power order. From our assumption we can assume that the numbering is such 
that Hi and H! have the same order (i =1,..., 7). Being cyclic, they must 
clearly be isomorphic. Let fi: Hi — Hi be any isomorphism. For z in G write 
zam-+- x, with xin H, Define f(x) by 


Fle) = Alt) + + + Fer) 


It is easy to verify that this prescription defines an isomorphism from G to G’ 
(ef. Exercise 12, Sec. 6). 


THEOREM 7.5 Lei G be the direct sum of cyclic groups Hi, . . . , Hy of orders my, 
+. 4 Mr, respectively. If the m; are relatively prime, then G is cyclic. 
Proof. Let H; be generated by a, and put @ = aap +++ a, Then 
at = askast - at, and soa = eifandonlyifa* =eforj=1,..., 
r. That is so if and only if k is divisible by mu, . . . , m,, and thus k must 
be divisible by their product, The cyclic subgroup (a) therefore contains 
Myt + + + m, elements, and that is the order of G. Hence (a) = G. 


Q.E.D. 


EXERCISES 

1. Describe in detail how to decompose a cyclic group of order 2 into a direct 
sum of cyclic subgroups of prime-power order. Write out the subgroups for 
n = 15 and = 20, To what extent are the subgroups unique? 

2. Referring to the proof of Corollary 1, Theorem 7.3, show that m — m = 
p* — pt, and work out the corresponding formula for mz — m2. 

3. Let G be a finite abelian group. By a character x of G is meant a mapping 
G —€ (complex field) such that x(ab) = x(a)- x(b) for any a, b in G and such 


Finite abelian groups 308 


that x does not map every element of G into zero. Show then that x cannot map 
any element into zero and is therefore a homomorphism of G to the multiplicative 
group C* of nonzero complex numbers. Prove that x(a) must be a number of 
absolute value 1 for any ain G. Prove that all the characters of G form a group G 
if we define xx’ by the rule (xx’)(a) = x(a) + x’(a) for any two characters and 
any a inG. G is called the character group of G. 

4, Write out all the characters of Zs*. 

5. Referring to Exercise 3, show that 


Y x@ =o 


for any character x » x0, where xo denotes the principal character, i.e., the unit 
element of G. (Hint: Replace a in x(a) by ab, b being an arbitrary element of G.] 
Show that 


Dx@ =0 


x 
if a ig not the unit element of G. 


6. Show that a finite abelian group G and its character group G are isomorphic. 
(Hint: Use Theorem 7.4.] 

7. Let G be a finite abelian group. Show that G is the direct sum of subgroups 
G,,. . . , G; such that the order n; of G; divides the order 2:4: of Gi41 for 7 = 1, 

.., 6-1. Show that the x, are uniquely determined by G (they are called its 
torsion coefficients). (Hint: Write G as a direct sum of eyetie groups H; of prime- 
power order and assemble the latter judiciously, taking note of Theorem 7.5.] 

8 Suppose that the G ean he expressed ua the direct sum of subgroups of the 
following respective orders: 

2, 2, 2, 5, 5%, 7, 7. 

What are the torsion coefficients of G? 


11 
Determinants 


1. Introduction 


The first part of this chapter is concerned with the problem of defining what is 
meant by the determinant of a square matrix. Before taking that up in the general 
case, it is helpful to review the specia] case of 2 x 2 matrices. 
It is well known that the determinant of a 2 X 2 matrix a = (a/,) with coeffi- 
cients in a field K is the quantity 
aha’, — ala’ 


again an element of K. For example, the determinant of 


¢ 1 

3 }) 

ig —5. The determinant ig thus a function, or operation, which produces an ele- 
ment of K from any 2 X 2 matrix with coefficients in K. It is obvious that if 
the two columns (or else the two rows) of a 2 X 2 matrix are interchanged, then 
its determinant changes sign; the determinant is zero if the two columns (or two 


rows) are identical. Some other simple properties are the following: For the two 
matrices 


ay aly and (ce ay 
a as ca, ae 
the determinant of the second is equal to the determinant of the first multiplied 


by ¢ (and similarly for a common factor in the second column, or else in one of the 
two rows). Further, the determinant of 


(3 +8! at 
ay +b? ay 


is (a + 60% — (a + bah, which is just the sum of the determinants of 


1 gl 1 gl 
a as) yg (het 
a ay ba 

(a similar statement holds if the b’s are added to the second column, or else to one 


of the two rows). Clearly the determinant of the 2 X 2 unit matrix I; is 1 (the 
unit element of the field K). 


Axioms for determinants 805 


‘We shall see that these simple properties are characteristic of determinants, and 
we shall in effect take them as a definition of determinants. A geometrical inter- 
pretation of determinants is given in Sec. 6. 


2. Axioms for determinants 


We consider now n X 7 matrices with coefficients in a field K, both n and K being 
fixed for the present. According to the notation of Chap. 9, the set of all such 
matrices is denoted by K”,. 
Suppose that in some way there is defined a mapping F from K*, to K. That is, 
F assigns to each matrix a in K*, a scalar F(a), For our purposes it will be con- 
venient to think of a in terms of its column vectors} a1, a, . . . , aa, and we shall 
frequently write F(a: as, . . . , ax), or other similar expressions, for F(a). Sup- 
pose now that F satisfies the following hypotheses: 
(1) Pla) changes sign if two adjacent columns of a are interchanged, and F(a) = 
0 if two adjacent columns of a are identical.t 
(2) P is linear as a function of the first column. That is, 


Fla +b, a, ..., 8) = Play a, ... , an) + F(b, a, ©. ©, ands 
F(eay, as, . 14) = + Play, ay... , an) 
for any a, ..., a,b in K" and any cin K. 


(8) F(,) = 1. 

Observe that these three hypotheses are straightforward extensions of the 
properties noted above for 2 X 2 matrices. Our aim is to show that there is one 
and only one mapping F: K", — K which satisfies (1), (2), and (3), and F(a) will 
then be defined as the determinant of a. It is important to remark that (3) is not 
used until (2.7). In Propositions 2.1 and 2.2, F stands for an operation satisfying 
axioms (1) and (2). 


PROPOSITION 2.1 If a’ is the matrix obtained from some matrix a by interchanging 
two columns, then F(a’) = —F(a). If any two columns of a are identical, then 
F(a) = 0. Finally, F is linear as a function of any of the columns. That is,§ 
a 
Powe, athe. Ja PFO.., ape DEPO..b 
aa FO... 0a, 2...) =e-FPO.., an. . 2) 


+ We could equally well use the row vectors. 

} If two adjacent columns of a are identical, then a remains unaltered if they are permuted; 
but F(a) must change sign, by the first assumption. Hence we have F(a) = —F(a), or 
(1 4+1)- F(a) = 0. If 1 +1 #0, then we can conclude that F(a) = 0, so that the second 
assumption is unnecessary, However, there exist fields in which 1 + 1 = 0, and for such 
fields the conclusion that F(a) = 0 is not valid. That is why it is explicitly included in 
our assumptions. An example of a field of characteristic 2 (Le., in whieh 1 +1 = 0) is 
the field z; of See. 10, Chap. 2. It contains only two elements. A somewhat less trivial 
example is the rational function field Z:(¢) in an indeterminate # over Z,. 

§ The notation of (2.1) is meant to indicate that only the jth column is in question. All 
other columns are arbitrary but fixed. 


306 ‘Determinants Ch. 11, Sec. 2 


where a; and b are any two elements of K, (column vectors) and where ¢ is any element 
of K. 
Proof. First of all, let a’ be the matrix obtained from a by interchang- 
ing the ith and (i + k)th columnsof a(k > 0). ‘The transposition (, i + 1) 
is equal to the product 


Gt+l--+G@+k-2it+k -DG+tkhitk—-h--- 
G@+2,i4+4h@41, 9 


which involves only permutations of adjacent indices, and there are 2k — 1 
factors (see Exercise 7, See. 3, Chap. 10). Hence the permutation of 
columns a; and a;4; can be achieved by a series of permutations of adjacent 
columns. Each such permutation results in a change of sign of F, by 
axiom (1), and since there are an odd number of them, namely, 2k — 1, it 
follows that the net result is a change of sign: F(a’) = —F(a). 

Now suppose that two columns of a are identical, say the ith and 
(i +k)th. Let a’ denote the matrix obtained from a by interchanging 
the ith column and the (¢ + & — 1)th column. Then a’ has two adjacent 
columns which are identical, and therefore F(a") = 0, by axiom (1). But, 
by what has just been shown, F(a’) = —F(a), and so F(a) = 0. 

Finally, to prove that (2.1) holds, we have only to interchange the first 
and the jth columns in the expressions on the left. Then, applying 
axiom (2) to the result, we again interchange the first and jth columns. 
For example, to verify the second equation of (2.1), we have F(a, ... 5 


ca, .. ) = —Plea,. so, ay) = es Flay, aye. oy DY 
axiom (2), and from the result established above this last expression is 
equal to +e- F(a, ... a) ...). The first equation of (2.1) is verified 


similarly, Q.E.D. 


corottary 1 F(a) = 0 if any column of a is zero. 

Proof. Suppose the first column iszero. Wehavethen F(0, a2, . . . + ax) 
= PO: 0a, ...,a,) =0-F(0,a, ..., a.) = 0, by (2.1). Thesame 
argument clearly holds for any column, 

corontary 2 Lei a’ be the matrix obtained from a given matriz a by an arbitrary 
permutation s of the columns of a. Then F(a’) = +F(a), the plus sign holding if s 
is an even permutation and the minus sign holding if s is an odd permutation. 

This follows immediately from the first assertion of the theorem above and from 
Theorems 3.4 and 3.5, Chap. 10. 


REMARK. Since F is linear in its dependence upon each of the » column vectors, 
it is called a multilinear function, or more specifically, an n-linear function. Be- 
cause F changes sign if two columns are permuted, it is said to be skew-symmetric. 

We recall that if a is any element of K", and if ¢ is any scalar, then ca is the 


Axioms for determinants 307 


matrix obtained by multiplying each coefficient of a by ¢ (see Sec. 6, Chap. 9). 
We have 


22 F(ea) = Flea, 1a.) = c'Fla, oss an) = e"P(a) 


as follows at once from (2.1). In particular, 


23 F(a) = (-D"F(a) 


PROPOSITION 2.2 Let a’ be obiained from a matrix a by adding c times the jth column 
of alo its ith column, where i sj. That is, a' has a; + ca; ag its ith column, all other 


columns being the same as in a. Then F(a’) = F(a). 
Proof. Assuming that i <j we have F(a’) = F(...,a;+ea,..., 
ay Fay ay EPO. Ca, a ee 


by (2.1). The last term here is zero, for the scalar ¢ can be taken out, by 
the second equation of (2.1), leaving two identical columns. @.E.D. 


We now derive an explicit formula for F(a). Let e; denote the jth column of 
the n Xm unit matrix L,, so that I, = (e, ey...) en), and let a = (a, a, 
- , ap) be any n X » matrix with coefficients in the field K. We have then 


24 apsahe t+ bate, = aie; 


using the summation convention on the right. In particular, a =a). Let us 
substitute this expansion in F(a, ... ,a,). Weget 


Fla ay 606, An) = Pert ++ taken a... ay) 
Applying the first equation of (2.1) repeatedly to the right-hand side, we get for it 
Playey a) 0 an) Fo EF Cy ay ess Ae) 
From the second equation of (2.1) we find that this is equal to 


ays Fley a. an) ts ba: Fey + Ae) 


Using the summation convention, we can condense the foregoing calculation into 


Play a, 20. a.) = Flaite; a, ©... a.) = a+ Fle;, a, +) And 


In each term on the right we repeat this procedure, using the expansion (2.4) 
for a. However, we must be careful to use a different summation index in order 
to avoid conflict with the j that is already there. To achieve a systematic notation 
we shall therefore replace j above by ji, and we shall write a; = aje;,, etc. The 
equation above becomes 


Play, ays, 


= ayF(e;, ay. - - 4, an) 
For each term on the right we have 


Pl@ig ay... 1 An) = Fle;, ape;, a...» an) 
PF (@i, Gig Boy - + ©» Bn) 


308 Determinants Ch. 11, See. 2 


where we have again used (2.1) repeatedly, just as above for a. Substituting the 
last expression in the preceding equation we obtain 


Flay ay... 4 Qn) = afiahF(e;, ei ay.» 5 An) 


Continuing in this way, replacing a; by its expansion aie;,, ete., we finally arrive 
at the formula 


28 Flay, ay. ae) = aia + + abP(e,, Cy - Cin) 


Now what is P(e, jy... »€),)2 If two of the indices jy, . . . , j, happen to 
have equal values (they all range independently from 1 ton), then Flej, . . . , @i,) 
is zero, by Proposition 2.1. The corresponding terms in the summation (2.5) can 
therefore be omitted. If on the other hand no two of the indices ji, . . . ,J, have 
the same value, then ji, .. . , jy must be a permutation of l,..., 2. In that 
case the matrix (e;,, . . . , e;,) is obtained from the unit matrix by the permutation 


( Zave ") 
hhee cde 
of its columns. Hence, 
Pley + 5&4) = sign {| Bose Fab) 
Ade ott Fa, 


by Corollary 2, Proposition 2.1, where the symbol sign (- - -) stands for +1 if 
the permutation is even and for —1 if it is odd. Putting this in (2.5) we get the 
formula 


LQ eeen 
ze Fla) = FL): sign (| ; sala «+ ale 

DU sig Abe RO 
the summation being over all permutations of 1,2, . . . , 2. We recall that only 


axioms (1) and (2) were used in arriving at this formula. It shows that P(a) is 
completely determined once F(I,) is specified. If we now add axiom (3), Fd.) = 1, 
it follows that there cannot be two different functions F satisfying (1), (2), (3). 
The unique function satisfying the three axioms --we have not yet proved that it 
exists—is denoted by det rather than by F, and according to (2.6) it must be given 
by the formula 


CL 2 wean ; ; 
at deta = sien (J . sab ale << af 
Diss Aho GA 


But now it is easily seen that the right-hand side of (2.7) defines a function 
which satisfies axioms (1), (2), and (3). The details of the verification are left to 
the reader. Axioms (2) and (3) are very trivial to check; the verification of 
axiom (1) requires Theorem 8.6, Chap. 10. We indicate the argument briefly, 
taking columns 1 and 2 for simplicity. If those two columns are interchanged, 
then the right-hand side of (2.7) becomes 


Axioms for determinants 809 


12. 
L sien (5 ‘ Beeb alt + + ade 
Noje ot n 
Now this symbol has exactly the same meaning as 
fl Bose ") hoa ; 
28 sign{ ” i: re 
2D sien G. Foe fy OE 
since the particular choice of summation index is immaterial (provided always 


that no conflict results). The permutation indicated here can also be written 
1 2s+s 2 


fi fest dn hohe hn 
position (1, 2), in that order. Therefore, by Theorem 3.6, Chap. 10, the permu- 


(; Loses ") ¢ Bowes ) 
- . and - - 
Wow an Kofeot Iw 


have opposite signs, and it follows that (2.8) is the negative of (2.7). From similar 
considerations it is easily seen that (2.7) is zero if columns 1 and 2 (or any other 
two adjacent columns) are equal (see footnote f, page 305). We point out here 
that an essential consideration in connection with (2.7) is the possibility of classi- 
fying permutations into even and odd categories. We state our results as the 


¢ Lee ™ ) sand it isu 0 the producto ( )ana the trans- 


tations 


following theorem: 
THEOREM 23 There is one and only one mapping from K", to K which satisfies 
axioms (1), (2), and (3) above, and it is given by formula (2.7). 

Naturally Propositions 2.1 and 2.2, and Corollaries 1 and 2, Theorem 2.1, hold 
for det. 


exampLe For 2 X 2 matrices, (2.7) reduces to 


det a= ST sign ( 2) al = alot ~ oa 


Remark. For 1 X 1 matrices ie., scalars --axiom (1) does not apply. It is 
easily seen that axioms (2) and (3) lead simply to det (a) = a in this trivial case, 
and the same result is given by (2.7}. 
THEOREM 2.4 For any n X n matrix a, det ‘a = det a. 
Proof. By definition of the transpose, the element in the jth row and 
the ith column of ‘a is a';. Consequently (2.7) beeomes 
1 2 -++ mn to a 
det ‘a= oe . | +8 ars a 
DG Rt Phe et 
1 2 -++ a 
fy kp te ky 


(: Bivee ") 
hohe Sn 


Now let ( ) denote the inverse of the permutation 


310 Determinants Ch. 11, See. 2 


Applying that inverse to the n-tuple al, ai, . . . . a, 
rearranges it into a’, a, . . . , a, and so the sum above can also be 
written 
 f1 Bree n) nok n 
sign(. ? mY | git ait os ate 
Daio(s gl) pee . 


But the sign of a permutation is the same as the sign of its inverse, and so 
this expression is equal to 


. Lo 2 +++ #8 
DU sign G wee Le aye ss + aye 


ky 
which is det a, by (2.7). Q.5.D. 


coroLtary Propositions 2.1 and 2.2, as well as axioms (1) and (2), hold if the word 
“column” is everywhere replaced by “row.” 


THEOREM 25 Lei a and b be two n X n matrices with coefficients in K. Then 
det (ba) = (det b)(det a). 

Proof. If as usual we denote the columns of a by a... , an, then 
the columns of ba are the vectors bai, . . . , ban, Now let us temporarily 
regard b as fixed and a as arbitrary. Define F by the formula F(a) = 
det (ba). Then, in terms of columns, we have 


F(a, ... a) = det (ha, ... , ba,) 


Since det satisfies axioms (1) and (2), it is easy to see that F defined in this 
way also satisfies (1) and (2). Therefore from (2.6) we have 


(1 Been 
Fla) = FO) D sign (| hon pe apt + ai 
= Pil.) + det a 


by (7.2). But F(,) = det (bl,) = det b, whence F(a) = det (ba) = 
det b- deta. Q.E.D. 


EXERCISES 
1, From (7.2) write out explicitly the formula for 8 X 3 determinants, Compute 
1 0 2 4 1-1 1 t 
-1 2 9 i) 
or and aa('s t-2 “) 
> 128 B 2 t-8 


2, Let a be an  X x triangular matrix. That is, all the elements below the 
diagonal are zero: a’; = 0 if i <j. Show that 
det a = ala, + + + a", 


3. Let abe an n X n skew-symmetric matrix, that is, a) = —a’,, and suppose 


Some applications Sil 


that the field of scalars does not have characteristic 2 (see footnote t, page 305). 
Prove that det a = 0 if n is odd. 

4, Let a be an » X » matrix, and let c be a nonsingular » X n matrix (Sec. 7, 
Chap. 9}. Show that the determinant of a’ = cae is equal to the determinant 
of a, Show furthermore that Tr a = Tr a’, where Tr a denotes the sum of the 
diagonal elements of a, and similarly for a’. That is, Tra = a, (This quantity 
is called the trace of a.) 


8. Some applications 


Determinants provide a very useful criterion for the linear dependence of vectors, 
as the following theorem shows. 


THeonem 3. Let abeann X n matrix with coefficients ina field K. Then deta = 0 
if and only if the columns of a are linearly dependent. 

Proof. Suppose that the columns of a are linearly dependent, Then 
one of them—for simplicity suppose the first—can be expressed as a linear 
combination of the others, say a; =m + +--+ +c,a,. By repeated 
applications of Proposition 2.2 we have 


det (m,... 


= det (a1 — Coay — + + + — ny 2 ey An) 
= det (0, a2... ,a.) =0 


by Corollary 1 of Proposition 2.1. If, on the other hand, the columns of 
a are linearly independent, then a has an inverse a~', by Theorem 8.3, 


Chap. 9. 
Applying Theorem 2.5 to the equation aa~’ =I,, we get (deta) 
(det a~') = 1, which shows that deta #0. Q.E.D. 


COROLLARY 1 det a = 0 if and only if the rows of a are linearly dependent; a has an 
inverse if and only if det a = 0. 

For the first assertion we have only to replace a in the theorem above by its 
transpose, taking account of Theorem 2.4. The second assertion follows from the 
foregoing and from Theorem 8.3, Chap. 9. 

Now let a denote any m X n matrix with coefficients in K, and let b denote any 
one of the r X r submatrices of a which can be obtained by striking out m — Fr 
rows and n — 7 columns of a (naturally we assume that r < m, 2). The quantity 
det b is called an r X r minor of a, or a minor of order r. We have 


conottary 2 The rank of a matrix is equal to the largest integer + for which the 
matrix has a nonzero minor of order r. 
Proof. This follows at once from the theorem above and from the 
corollary of Theorem 8.3, Chap. 9. 
We now show a method for caleulating determinants which is often simpler than 


312 Determinants Ch, 11, See. 3 


the formula (2.7). Again let a denote an n X » matrix with coefficients in a 
field K. 


DEFINITION 3.1 The (x — 1) X (n — 1) matrix objained by striking oul the kth row 
and jth column of then X n matrix a is called the minor of the element at, and it is 
denoted by Ak, The sealar 


A‘, = (-1'/- det At 


is called the cofactor of the element at;, 
Consider now the mapping F: K", — K of x X n matrices to the field K de- 
fined by 


Pla) = ahah taba + ++ + abl, 


the index & being arbitrary but fixed. We claim that F satisfies axioms (1), (2), 
(3) of Sec. 2 and must therefore be the same as det. To verify (1) let us consider 
the effect of interchanging the first two columns of a. That produces an inter- 
change of the corresponding columns of all the minors A‘; except A‘ and A‘), and 
consequently all the cofactors A*,; must change sign, except A", and A*, But 
those two must also change sign; for the reversal of columns 1 and 2 interchanges 
the minors A‘, and A‘s, hence produces a change of sign in the cofactors because 
of the factors (—1)*+! and (—1)**? involved in their definition. It follows at once 
that F(a) must change sign. If the first two columns of a are identical, then 
F(a) = 0; for then all the minors A*;(j = 1, 2) have two identical columns, and 
therefore the corresponding cofactors are zero. Hence the expression above for 
F(a) reduces to its first two terms, and they are identical except for sign. The 
same argument applies to any two adjacent columns of a, and therefore F satisfies 
axiom (1). The verification of axioms (2) and (3) is routine. We have the follow- 
ing theorem: 


THEOREM 3.2 For any n X n matrix with coefficients in K (n > 1), 


det a = a4 A + aA 4. 4at,Ay, 
aA, taA toes fata, 


The first formula is often called the “expansion of det a by the kth row’; the 
second formula, expansion of det a by the Ath column, follows from the first 
applied to the kth row of 'a. 


Examptes The following determinant is computed using expansion by the second 
row (determinants are often indicated by enclosing the matrix elements between 
vertical straight lines): 


3 1 -2| 1 
402 be | 
1-38 | 1 - 


Some applications 313 


In computing determinants by the method of Theorem 3.2 it is obviously ad- 
vantageous to expand by a row or column containing as many zeros as possible. 


exampte 2 The well-known Vandermonde determinant 


fy tPF «ee ant 
By Beh vee pete 

An = 
Lom, a? aan 


is equal to T] (2; — 2). (Superseripts here are exponents.) 
i>k 
This may be proved by induction, starting with x = 2, where both sides of the 
identity yield x, — x. Assume now the identity true for all positive integers 
smaller than x. Applying Proposition 2.2 repeatedly, we subtract from the nth 
column the (n — 1)th column multiplied by 2, then subtract from the (n — 1)th 
column the (n — 2)th column multiplied by x, etc. In this way, we obtain 


1 0 0 ose t) ! 
A Lom xy 2 — are et — ayy? 
Loge —% fe? — Ot, 6 Bet — aye”? 


We now apply the corollary of Theorem 2.4 and subtract the first row from 
the others, obtaining 


‘1 0 0 tae 0 
A= 7 a a Cal 
0 te —m 2 — yt, 1 at — aye,tt 


By Proposition 2.1 and the corollary of Theorem 2.4, we may take the factors 
% — Hy 2s — ty, .. . 2, — 2 out of the second, third, . . . , nth rows of A, and 


If we now expand by the first row, we see that the remaining determinant is equal 
to the Vandermonde determinant 4,_) whose entries are powers of x2, t3, . . . ) Ta 
The result now follows by induction, since 


Tso@-#=-T[@-- [TT @-= 


n>ivarl ind n>irh>2 


Sth Determinants Ch. 11, See. 3 
From the theorem above we derive two important corollaries: 
conotiary a fj = k, then 
aiAy + 4bAh +++ +ahAs, = 0 
and 
aA’ +aA% +--+ +4%A% = 0 
Proof. Let a’ be the matrix obtained by replacing the kth row of a by 
the jth row. Then the jth and kth rows of a’ are identical, and so det a’ = 
0, by Proposition 2.1. Expanding det a’ by the kth row according to 


Theorem 3.2, we get the first equation of the corollary. The second equa- 
tion follows similarly by using the columns of a. Q.E.D. 


corottary 2 Let 4 = ‘(A/,) be the transpose of the matrix of cofactors of a. Then 
aa = fa = (det a)-I,. Hence, if det a ~ 0, we have 


-.5 
~ deta 


Proof. This follows immediately from the equations in Theorem 3.2 
and Corollary 1 and from the definition of matrix multiplication. 


DEFINITION 3.2 The matrix & associated with a is called the adjoint of a. 


examece The adjoint of 


-2 2 -1 
is 2-7 74 
“\-1 ¢f (2 


We next show how determinants can be used to obtain a formula for the solution 
of a system of linear equations. Again let a be a matrix in K",, and suppose that 
we want to solve the equation 


ax=y 


for the column vector x, where y is a given vector in K”. If det a 0, then a! 
exists, and the answer is obtained by multiplying the equation by a7): 


x=aly 


A somewhat different formula for x can be obtained as follows: The equations to 
be solved are 


agi = yt @=l...,n) 


Multiply by the cofactor A’, and sum over i: 


Yo Ata! = Taw! 
a : 


Some applications 815 


It follows at once from the second equations of Theorem 3.2 and of Corollary 1 
above that this equation is simply 


(det a) - xt = $7 Alby! 


By Theorem 3.2 the right-hand side of this is the determinant of the matrix 
{+++ Bi Y, Qa, + + - » a) Obtained from a by replacing its kth column 
by y. We have then the following theorem: 


THeonem 3.3 (Cramer's rule) Let a be a nonsingular matrix in K*,, and let y be a 
column vector in K". Then the unique solution of the equation ax = y is given by 
the formula 


(k) 
i eee 
det a 


(h=1,...,7) 


REMARK. The problem of computing determinants is one of considerable prac- 
tical importance. The general formula (2.7) contains x! terms. For example, 
for n = 10, a rather modest value, there are 3,628,800 terms to be added, each 
involving a product of 10 factors. It is plain that (2.7) does not afford a very 
practical method of computation, Many special procedures have been devised 
for diminishing the labor of computing determinants. Some of them refer to 
matrices whose entries occur in rather special patterns; others yield approximate 
values of determinants. For ordinary purposes the most practical general method 
of computation consists of reducing the matrix in question to triangular form by 
means of elementary row and column operations, as described in Sec. 9, Chap. 9. 
We recall that an elementary row operation involves adding multiples of one 
row in a matrix to the other rows; similarly for elementary column operations. 
It follows from Proposition 2.2 and the corollary of Theorem 2.4 that such opera- 
tions do not alter the value of the determinant. In Sec. 9, Chap. 9, it was shown 
that any matrix can be reduced to triangular form by a series of elementary row 
operations, possibly with certain permutations of the rows or columns, The 
permutations, if required, do alter the sign of the determinant, but that does not 
pose a serious problem. For a triangular matrix it is trivial to compute the de- 
terminant (see Exercise 2, Sec. 2). If a (square) matrix contains several zero 
entries, it is sometimes possible to maneuver them into one corner so as to form 
a block of zeros. The following theorem is useful in that situation: 


THEOREM 3.4 Let a be an m X m matrix, let b be ann X n matrix, and let ¢ be an 
m Xn matrix, all with coefficients in a field K. Then the determinant of the (m +) X 
(m + 1) matrix 


316 Determinants Ch. 11, See. 3 


Gs) 

a1 

Ob 

is equal to (det a)(det b).t 


Proof. Temporarily regard b and c as fixed matrices and a as an arbi- 
trary matrix. Denote the determinant of (3.1) by F(a). Then it is trivial 
to see that F(a) so defined satisfies axioms (1) and (2) of Sec. 2. Therefore, 
by (2.6) and (2.7) Fla) = F(,,) det a. Now Pd,) is the determinant of 
the matrix 


(¢ ) 

0 »b 

Denote this determinant by G(b), regarding ¢ as fixed. It is very simple 
to verify that G(b) satisfies axioms (1) and (2) for the rows of b, rather 
than the columns. In other words, G(b) satisfies axioms (1) and (2) for 


the columns of ‘b. Hence, by (2.6), (2.7), and Theorem 2.4, Gb) = 
G(l,)-det bh. But Gd.) is by definition equal to 


det ¢ e ) 
Oo Th 
which is equal to 1, since the matrix is triangular and all its diagonal entries 


are 1, Since F(I,) is the same as G(b}, the assertion follows at once. 
QED. 


Similar results can be proved if the block of zeros occurs in one of the other 
corners. 


EXERCISES 
1. Compute the following determinant in three ways, expanding by the first 
row, expanding by the second column, and reducing to triangular form. 
1-2 0 8 
4 0 2 -4 
2 0 6 0 
1-1 0 2 
2, Compute the determinant of the following matrix. Compute the adjoint 
matrix and verify the equation a& = (det a) - I: 


t-1 1 4 
9 #-2 -1 
3 2 t-3 
3, Compute the following determinant: 
ooL, 
I, 6 


(The zeros stand for zero matrices of the appropriate size.) 
{ The symbol 0 here denotes the n  m zero matrix. 


The characteristic polynomial 317 


4. Compute the determinant and the adjoint of the following matrix with 
coefficients in the field Z;. (See Sec. 10, Chap. 2. The residue class notation intro- 
duced there is used below.) 


I 2 3 
-I 3 38 
2-2 4G 


Verify Corollary 2 of Theorem 3.2 for this matrix. 
5. Solve the following equations using Cramer’s rule: 
Bz) 4 22? = 1 ut 2y 4 
z- 2 = 2 a-—4dy+ z=1 
ax 4+ y-—3z=0 
6. Let 21, 22, 2 be coordinates in a three-dimensional euclidean affine space, and 
let a'jz; = b; (7 = 1, 2,3) be equations of three planes in that space. Prove that the 
intersection of these planes consists of a single point if and only if det a ~ 0. 
7. If 


prove that D, = a.D,-1 + Du» 


4. The characteristic polynomial 


Here we shall have occasion to consider matrices whose entries are polynomials 
(such matrices have already appeared in some of the exercises). In Chap. 6 
(Theorem 2.5) it was shown that, starting with any field K (or more generally any 
integral domain), we can construct from it a new integral domain Kt] whose 
elements are all the polynomials 


FO) = a Fart +a + ++ + ante 


in a variable t, with coefficients in K. If a, above is not zero, then the expression 
for f(2) is unique, and x is called the degree of f(t). 

From the domain K[t] we can furthermore construct a field K() containing K(t) 
as a subring, each element of K(t) being a polynomial or else the quotient of two 
polynomials (See. 2, Chap. 7). Since K(é) is a field, the foregoing discussion of 
determinants applies to matrices with coefficients in K(t). For the most part our 
matrices will have polynomial entries (including of course polynomials of degree 
zero, i.e., elements of K), Taking K to be the field of rational numbers, two simple 
examples of matrices with coefficients in Q(é) are 


818 Determinants Ch. 11, See. 4 


3 Pt 10 S41 
1 t ae dee 
72 Of t41 


In order to indicate that a matrix has coefficients which are polynomials in ¢, 
possibly but not necessarily of degree greater than zero, we shall sometimes use 
such notations as a(?), b(6), ete. It will be convenient to refer to elements of K as 
“constants.” A matrix with constant coefficients is thus just a matrix with co- 
efficients in K. 

It follows from our earlier definitions (Sec. 6, Chap. 9) that the second matrix 
above is equal to 


too L 00 3 0 0 0 
OO 4f4+¢-[2 0 O)4+%-[0 TL 
Oo 0 1 oO 11 0 0 0, 


an expression in which all the matrices have constant entries. This observation 
leads at once to the following proposition: 


pRoposiTION 4.2 Let a(t) be a matrix with coefficients in K[t] (that ts, polynomials in 
t), Assuming that not all the entries are zero, let m be the greatest of their degrees. 
Then a(t) can be expressed in one and only one way in the form 


a(t) = My + ta + Pag toe) + iO, 


where each a; is a matrix lof the same size as a(t)] with constant coefficients. 
The proof is trivial. 
Of particular importance to us are matrices of the type 


teloa 


where ais an x X 1 matrix with constant coefficients. (When x is understood, we 
shall sometimes write I instead of I, for the x X a unit matrix.) 


ExampLe 1 For the matrix 


1 3 2 
a=[0 1-21 
oO -1l 2, 
we have 
tl—-a-=t- 


eto ono 
ros 
| 
— 
Ser 
| 
ho oo 
io 
new 


Oo -1 -2, 0 1 t+2 


The characteristic polynomial 319 


THEOREM 4.2 Let a be ann Xn matrix with coefficients in the field K. Then the 
quantity p(t) = det (11 — a) is @ polynomial of degree n in t with coefficients in K. 
elt) is called the characteristic polynomial of the matriz a, Its highest coefficient is 1, 
and its constant term is equal to (—1)" det a. The negative of the coefficient of t-! in 
e(t) is equal to the sum of the diagonat elements of a.t 
Proof. As usual let ai; be the (j, k) entry of a, and let 6/; be the (j, k) 
entry of I. Then the (j, k) entry of tI — a is by definition equal to 5, — 
ai, The determinant of this matrix is then, by (2.7), equal to 


ott) = det (1 - a) 
= Daien(; ue hy) (8 — ab)» (Be = ais) 
joe Ge i i i i 


It is obvious from this that ¢(é) is a polynomial of degree at most x, Recall 
that 6 = 0 unless j: = 1, and so on. It follows that there can be only 
one term in the sum above containing i”, namely, that term for which 


hate =2,...,5, =”. The term is therefore equal to 

(ayy - ah) +s (f-a) =e -— tae tes tate 
+ etc. 

This shows that the coefficient of t" in the polynomial ¢(¢) is 1, as claimed. 

Now for any permutation ji, ..., j, of 1, ..., m, if any one of ait, 

32, . . . , & is zero, then another one must also be zero, and in that case 

(66) — ay!) + + + (tir — ai?) 


can have degree at most equal to n — 2. Hence the only term of degree 
n — Lis the one written above, and the negative of the coefficient of i"! 
jsindeed the trace of a. Finally, the constant term of y(t) is simply ¢(0) = 
det (0-1 — a) = det (-—a) = (-1)"deta. Q.£.D. 

exampce z The characteristic polynomial of the matrix a of Example 1 is # — 

4 +3. 

We now come to an important theorem about characteristic polynomials. Let a 
again denote an n X » matrix with coefficients in the field K, and let 


A) = ey Fei +e + os Hot 
be a polynomial, also with coefficients in K. We introduce the notation 

f(a) = ell, + ca + ea? +--+ + ea" 
Thus f(a) is an n X n matrix obtained by putting a for the indeterminate ¢ (and 
the unit matrix for {). Similar notation has already occurred in Sec. 4, Chap, 9. 


THEOREM 43 (Cayley-Hamilton) Let y(t) be the characteristic polynomial of an 
n X n mairiz a with coefficients in K. Then g(a) = 0. 

Proof. By definition, o(t) = det (1 — a). One might naively try to 

substitute a for din this determinant. The result would be det (al — a) = 


+ The sum of the diagonal elements is called the trace of a, denoted by Tr a. 


320 


42 


Determinants Ch, 11, See. & 


det (a — a) = 0, all right; but that has nothing at all to do with the 
theorem we want to prove, for the zero here is a scalar, whereas the theo- 
rem says that g(a) is the m X x matrix whose entries are all zero. That 
point being settled, we go on with the proof. 

Let us write b(t) for £1 — a, Then det b() = g(t), of course. Now let 
B(£) denote the adjoint matrix of b(é) (Definition 3.2). The entries of b(t) 
are then the cofactors of the elements of b(t) (Definition 3.1), and it is 
readily seen from (2.7) that those entries are therefore polynomials in ¢ of 
degree at most 2 — 1. Consequently we ean write b(é) uniquely in the 
form 


b® =b + th ++.) +P Ob 


where the b, are constant matrices (see Proposition 4.1). Now by Corollary 
2 of Theorem 3.2 we have b(t) - b(t) = det b(t). Thatis, 


(1 — a) (by + thy + +) +b) = oO + 
Expanding the left side out [using the rules (6.11), Chap. 9] we get 
—aby + by — aby +. 47! (be - abn) +b = OO -T 


By Proposition 4.1 this is an identity, meaning that the matrices on either 
side of the equation with a given power ¢* of { must be the same. In other 
words, if say 


eQ =a batted +.) teat +e 
then we must have 
~aby = co+1 fy — ab, = 


Consequently, (4.2) must remain true if we replace ¢ by the matrix a, or by 
any other x X n matrix. Now we would like to substitute a for ¢ in (4.1). 
The left-hand side would clearly become zero, allowing us to conclude that 
g(a) must be zero. The trouble here is that the coefficients of ¢ are 
matrices, not scalars, and there is no guarantee that a polynomial identity 
involving matrices remains true if we substitute a matrix for i, because 
matrix muftiplication is not commutative in general. Hence, even though 
(4.2) does remain true if we substitute a for ¢, it does not follow auto- 
matically that (4.1) remains true under the same circumstances. To see 
what happens here, suppose we had first substituted some matrix s for 
the variable ¢ in (4.1) and had then expanded out the left-hand side. We 
should have got 


T etc. 


(s — a)(by + sh) + © - - +8" by) = —abo + (sby — ashy) + - - - 
+ (87,2 — as"'b, 2) + s"b, i 


The characteristic polynomial 321 


which is nof the same as the result of substituting s for ¢ in (4.2) unless 
sa = as. But for the special cases = a that is certainly true, and there- 
fore substitution of a for ¢ in the left member of (4.1) gives the same result, 
namely, 0, as substituting a for £ in the left member of (4.2). And since 
the latter equation remains true with a in place of t, we conclude that 
g(a) = 0. Q.5.D. 


Examece 3 For the matrix a of Example 1, we have a? — 4a + 31 = 0 (ef. 
Example 2), This is easily verified directly. 

The question of substituting a matrix for a variable, as in the foregoing proof, 
will arise from time to time, and we therefore insert the following theorem, which 
covers many cases of importance, although not the case with which we have just 
been occupied. 


THEOREM 4.4 Any identity among polynomials in t with coefficients in K remains 
true if tis replaced by any n X n matrix a with coefficients in K. 

The proof is quite trivial, essentially the same as the proof of Theorem 2.4, 
Chap. 6. The point is that the only matrices which will appear after the substitu- 
tion are various powers of a, and they all commute with each other. 

As an immediate application we have the following theorem: 


THEOREM 4.5 Let a be an nm X n matrix with coefficients in K. Then any positive 
power of a can be expressed as a linear combination, with coefficients in K, of 1, a, 
a,..., a" If det a ¥ 0, then any negative power of a can be expressed in the 
same way. 
Proof. Let m bea positive integer. By the division algorithm (Propo- 
sition 3.2, Chap. 6) there exist polynomials q(f) and r(¢) in K[é] such that 
" = q(t) - pt) + 1()), ¢() being the characteristic polynomial of a; and 
deg + < deg y orelser = 0. By the preceding theorem we can put a for / 
in this equation, getting a" = g(a) + p(a) + r(a). By the Cayley-Hamil- 
ton theorem, g(a) = 0, and so a” = r(a), The right-hand side is indeed 
a linear combination of powers of a not exceeding the (n ~ 1ith power. 
For negative powers, we recall that (0) = (—1)" det a (Theorem 4.2). 
Hence y(0) # 0 if det a ~ 0, and then clearly ¢(i) is not divisible by the 
irreducible polynomial ¢ (Theorem 4.1, Chap. 6). Hence ¢” and g(t} are 
relatively prime, and so there exist polynomials git} and A(t) in K[f] such 
that 1 = g(é) - " + A(é)- g(t), We can assume moreover that deg g(t} < 
n, for otherwise let ¢(é) be the quotient and r(/) the remainder upon divi- 
sion of g(t) by y(t), so that g(t) = ¢(t)- of) + rt). Then we have 
1 = rie + (gt) t + kW) oO. By Theorem 4.4, this equation 
remains valid upon substitution of a for ¢ (and I for 1 on the left). Since 
(a) = 0, we get I = r(a)-a”. Multiplying by a7” we obtain finally 
a” = r{a). QED. 


322 ‘Determinants Ch. 11, See. 4 


Remark. In many instances the methods just described for computing a*™ are 
quite useful; they replace matrix calculations in part by polynomial calculations. 
We point out here the following short cut for a~'; [f the characteristic polynomial 
is g(t) = co ted + +++ 4 ent?! + £", then we have (Theorem 4.3) 


col Fora + +++ +e, 4a"! + a" = 0 


and multiplication by a7! gives us the formula 
= 1 at gr 
ata =o (ed tes # eae? + att) 
0 


Theorem 4.3 says that any n Xm matrix a with coefficients in K satisfies a 
certain polynomial equation of degree » with coefficients in K. Now it may hap- 
pen that there is a polynomial /(¢) in K[é) of degree less than n such that f(a) = 0. 
Among all the polynomials /(/) of lowest possible degree such that f(a) = 0 there 
is clearly a unique monic polynomial, say go(f). It is called the minimal poly- 
nomial of a, There is a close connection between the characteristic and minimal 
polynomials, as the theorem below shows. They may indeed be identical. To 
give an example where they are not the same, we mention that the characteristic 
polynomial of the x X n unit matrix is (¢ — 1)"; its minimal polynomial is ¢ — 1. 
In Theorem 4.5 and its proof the characteristic polynomial can be replaced by 
the minimal polynomial. 


THEOREM 4.6 Lela be ann Xn matrix with coefficients in K. Let oft) and yo(t) 
denote its characteristic and minimal polynomials, respectively. Then yo(t) divides 
git), and g(t) divides y(t)". The prime factorizations of (6) and elt) contain the 
same trreducible factors. 
Proof. From the division algorithm we can write g(t) = 9(t)+ eo(t) + 
r(t), where ¢ and r are certain polynomials such that deg r < deg ¢o or 
else r = 0. Putting a for f in the equation, we get r(a) = 0, using Theo- 
rems 4.3 and 4.4, Therefore r(¢) = 0, by definition of yu(f). 
To prove that ¢(t)|go(t)" we use the following device: Let ga(t) = 


Ute tt.» +eut+¢. Wedefine new matrices w, by 

wi =a faath4.-- te 4atel G=G1...,7-)) 
In particular, wo = I and wy) = a! + ca"? + ++ - +e. 

Clearly aw; = Wyy1 — ¢;4I for j <r —1, and aw, = gia) ~ et = 


el, since g(a) = 0. Now put 
wt) = 0 'wo + tw bo Hoe 


From the equations above we have 


The characteristic polynomial 823 


(A — a) WW) = Pwo FW bo + Pe 
+ iW, ~ Paw) — aw - 
~ taW,_2 — aw, 
SD tere +s + Pe 
+ OW ~ Ew al) ~ 
= UW = Crm) + ed 
= wl) I 


Taking determinants of both sides and using Theorem 2.5, we get y(t) - 
det wif} = goll)". Since det w(f) is a polynomial, the assertion is proved. 
If an irreducible polynomial p(t) divides g(t), then it must divide go(f)", 
hence also yo(t) (Theorem 8.5, Chap. 6). It follows at once that g(f) and 
volt) have the same irreducible factors. 


conottary If {(f) is any polynomial in Ki], then the matrix f(a) is nonsingular if 
and only if {(f) and olf) are relatively prime. If g(t) is any polynomial of positive 
degree which divides g(t), then det g(a) = 0. 

Proof. Let g(t) be the g.c.d. of f() and (f); and let p(t) be an irre- 
ducible factor of g(f), assuming that deg g() > 0. Then, by the theorem 
just proved, p(é) also divides the minimal polynomial g(t), say golf) = 
qt) « pf). Then go(a) = ¢(a}- p(a) = 0. It follows that det p(a) = 0, 
for otherwise p(a) would have an inverse and it would follow that e(a) = 0, 
contradicting the definition of yo(f). Since p(t) divides g(t}, we have, say, 
ot) = m() - pl), and so g(a} = qi(a) - pa). Hence det g(a) = det g(a) - 
det p(a) = 0. Similarly, det f(a) = 0. 

From this it follows that if /() and y(t) are not relatively prime, or if 
g(t) divides g(t), then det f(a) = det g(a) = 0. If f(é) and g(t) are rela- 
tively prime, then there exist polynomials r(é), s(t) such that 1 = r()() 
+ s()e(t). From Theorems 4.3 and 4.4 we have then I = r(a) - f(a), 
showing that f(a) is nonsingular. QED. 


The important part played by the characteristic polynomial will be clarified in 
the following section and in Chap. 13. 


EXERCISES. 


1. Use Theorem 4.5 to express a‘ and a~* as linear combinations of I, a, a’, where 


1 0 2 
aa=([2 1 :) 
Oo -1 4, 


2, Show how to compute the characteristic polynomial of a triangular matrix. 
Tfais an» X n triangular matrix whose diagonal elements are all zero, prove that 


a" = 0. 


324 Determinants Ch. 11, See, 5 


3. Let a be an x X ” matrix such that a” = 0 for some positive integer m. 
Prove that (I ~ a)? =T + a4 +--+ +a"! 

4, Let f(t) =e toe ++ + cyt"! + ( be a polynomial in K(#]. Show 
that the characteristic polynomial of the matrix 


9 9 O . . » 0 =e 
100... 6 =e 
9 2 0 - . 1 0 Hee 
0 60 ~ oe . 1 0 ~Otn-2 
Ce ns 


is f(). (Hint: Expand the required determinant by the first column and use in- 
duction on 7.) 

5. Show that a square matrix and its transpose have the same characteristic 
polynomial. 

6, Write a matrix whose characteristic polynomial is ~ & + 1 and verify it. 

7. Determine the minimal and characteristic polynomials of the matrix 


0 1 2 «90 
0 0-3 4 
09 0 0 & 
9 0 0 0 


8. If ais a nonsingular x X 2 matrix with characteristic polynomial g(t), show 
that am! has (~1)" + (1/det a) - (1/0) as ita characteristic polynomial. Verify 
this for the matrix of Example 1 above. 


5. Eigenvalues and eigenvectors 


One of the most commonly occurring problems involving vector spaces and linear 
mappings is the following: Given a linear mapping T of a vector space U to itself, 
does there exist a nonzero vector in U which is mapped by T into itself—or more 
generally, into some multiple of itself? That is, does the vector equation 


T(x) = px 


have a solution for p and x (other than zero)? 

The importance of this problem, both in mathematics and its applications, can 
hardly be overstated. Indeed it would be fair to say that this problem and some 
of its ramifications constitute one of the most important applications of mathe- 
matics to science. 

Tf the equation above holds, then we have T(ax) = @- T(x) = ap-x for any 
scalar a. Hence T maps the one-dimensional subspace of U spanned by x into 


Eigenvalues and eigenvectors $25 


itself. Therefore our problem can be stated as follows: Is there a one-dimensional 
subspace of U which is mapped into itself by T? 

In the case of a euclidean vector space U (Sec. 11, Chap. 8) a nonzero vector x 
can be thought of as determining a direction. Then x and px represent either the 
same direction or else opposite directions, for any scalar p ~ 0. Hence our prob- 
lem, in geometrical terms, is that of determining whether there is some direction 
which is either unchanged by T, or else simply reversed. 

In this section we shall discuss this so-called ‘eigenvalue problem” for finite- 
dimensional vector spaces. We begin by stating the Cayley-Hamilton theorem in 
a somewhat changed form. We require first the following theorem: 


THEOREM 5.1 Let a,b be two n X n matrices with coefficients in a field K, with b non- 
singular. Then a and bab have the same characteristic polynomial 
Proof. b-(t1 ~ ayb = ti — blab. Since (det bo!) - (det b) = 

det (b-'b) = det I = 1, we have det (1 — a) = det [b-'(I — a)b] = 

det (1 — bab). QB. 


coroitary Let T:U — U be a linear mapping of an n-dimensional} rector space 
over a field K fo itself. Then the characteristic polynomial of the matrix of T relative 
to a base in U depends only on T and not on the particular choice of base. 

Proof. Let a be the matrix of T relative to a base {u;}, and let a’ be its 
matrix relative to a second base {ui}. Ifb is the matrix from {u,} to {uft, 
then a’ = b-'ab, by Theorem 7.2, Chap. 9 [see Eq. (7.4}]. The assertion 
then follows at once from the theorem above. Q.E.D. 


The corollary shows that the characteristic polynomial is associated with T and 
not just with a particular matrix representing T. We make the following definition: 


DEFINITION 8.1 The characteristic polynomial of a linear mapping T:U — U of an 
n-dimensional vector space is defined to be the characteristic polynomial p(t) of the 
matrix of T relative to any base w,..., Un of U. The quantity (—1)" + e(0) is 
defined fo be the determinant of T. 

Thus the determinant of T is just the determinant of the matrix of T relative to 
any base. 

Theorem 4.3 can now be stated for linear mappings. We first introduce the 
following rather obvious notation: If f(f) =e tat + --- + cf’ is any poly- 
nomial with coefficients in a field K, and if T:U — U is an endomorphism of 
a vector space over K, then by f(T) we denote the linear mapping 


fP =ol taT+--- +61 
where I as usual denotes the identity mapping (ef. Sec. 4, Chap. 9). 


{ We naturally assume x > 0 to avoid trivialities. 


326 Determinants Ch, 11, Sec. 5 


THEOREM 6.2 (Cayley-Hamilton) Let T be an endomorphism of a finile-dimenstonal 
vector space U, and let g(t) be tts characteristic polynomial. Then g(T) = 0. 
Proof. Let a be the matrix of T relative to some base in U. Then the 
matrix of ¢(T) is ¢(a), by Theorem 6.4, Chap. 9. Hence the matrix of 
e(T) is 0, by Theorem 4.3. That is, ¢(T) maps every vector in U into the 
zero vector. 4.5.D. 


Remark 1. By a similar argument, Theorem 4.5 holds with T in place of the 
matrix a. The definition of the minimal polynomial extends at once to endo- 
morphisms T of finite-dimensional vector spaces; and it is easily seen that the 
minimal polynomial of T, that is, the monic polynomial y(t) of lowest degree, with 
coefficients in the field of scalars, such that go(T) = 0, is the same as the minimal 
polynomial of any matrix representing T. Theorem 4.6 and its corollary hold for 
T. In particular, we have the following theorem: 


THEOREM 5.3 Let T be an endomorphism of a finite-dimensional vector space U over a 
field K, and let g(t) be its characteristic polynomial. Then there exists a nonzero 
rector x in U such that 


521 T(x) = px 
for a scalar p in K if and only if g(p) = 0. 


Proof. Equation (5.1) is the same as (pl — T)(x) = 0, by definition of 
pl — T, where I is the identity mapping. There is a nonzero vector x 
which is mapped into zero by pl — T if and only if this operator has 
nullity > 0, and that is so if and only if its rank is < » (Theorem 3.2, 
Chap. 9). If ais the matrix of T relative to some base, then pI — a is the 
matrix of pl ~ T relative to the same base, and the rank of pl — T is the 
same as the rank of pl — a (Theorem 8.2, Chap. 9). Hence (5.1) has a 
nonzero solution if and only if the matrix pl — a has rank < n, which is 
the case if and only if its determinant is zero (corollary to Theorem 3.1 
above). But det (pl — a) = g(p). @e.D. 


‘As a matrix version of the same theorem we have the following corollary: 


conotary Let abe ann X n matrix with coefficients in a field K, and let p be an 
element of K. Then there is a nonzero column vector x in K® such that ax = px if and 
only if ep) = 0, where ¢ is the characteristic polynomial of a. 


DEFINITION 5.2. T being a linear mapping of a finite-dimensional vector space U to 
uself, the roots of the characteristic polynomial of T which are in the field of scalars are 
calied the eigenvalues of T (or of the matrix of T relative to any base for U). If p isan 
eigenvalue of T, then any vector xin U such that T(x) = px ts called an eigenvector of 
T belonging io p. 

The term characteristic root is sometimes used instead of eigenvalue. 


Eigenvalues and eigenvectors 327 


Examete1 Consider the mapping of Q? to itself that maps a column vector x into 
ax, where 


=(- 1) 


Hence 1 is the only eigenvalue. According to Theorem 5.3, the equation ax 
has a nontrivial solution only if p = 1. Hence the equation to be solved is ax = x. 
or 


a Oct =a! 
gag 


or simply zt = 0. Hence the eigenvectors belonging to the root 1 are the vectors 
x 
0 

where z is an arbitrary element of @, and these are the only eigenvectors. 


exampce2 Consider the mapping R? — R? that sends an arbitrary column vector x 
into ax, where 


0 -1 
a= 
1 0 
The characteristic polynomial is @ +1. There are no (real) eigenvalues. If we 
replace R by the complex field C, then the eigenvalues are +7, and Eq. (5.1) be- 


comes ax = 1x or ax = ~ix. Taking the + sign, our equation ax = ix is 
—a = if 
wl = tg? 
If we put z' = c, then 2? = ~ic, and it is quickly seen that the eigenvectors be- 


longing to +3 are all the vectors 


(3) =e. ( ‘) (¢ any complex number) 


te, a 


Similarly, the eigenvectors belonging to ~i are all the vectors 


REMARK 2. If p is an eigenvalue of a linear mapping T: U — U, then all the 
eigenvectors of T belonging to p form a subspace V of U. For V is simply the 


328 Determinants Ch. 11, See. 5 


kernel of the mapping pI — T (see proposition 3.1, Chap. 9). The dimension of 
V is called the multiplicity of the eigenvalue p. In Example 2 above, both 7 and 
—ithave multiplicity 1. In Example 1, the root 1 also has multiplicity 1, although 
it is a two-fold root of the characteristic polynomial. In general the multiplicity 
of an eigenvalue need not be the same as its multiplicity in the characteristic 
polynomial. A connection is established in a special case in Theorem 5.5. 


exampLe 3 Let U be a vector space of dimension n over a field K, and let w, 
Uy, .. +, Us be a base for U. Define a mapping T:U > U by Tw) = pai 
(i =1,..., 7), where pr, ps, .. - , Pa are arbitrary scalars (Proposition 5.1, 
Chap. 9). Then the p, are the eigenvalues of T, for the matrix of T relative to 
the given base in the diagonal matrix whose diagonal elements are p,, ... , 
Pn Consider now one of the p’s, and suppose it appears exactly & times in the 
given list, say a3 pi, pa . - . . is for simplicity, It is then easily seen that the 
eigenvectors corresponding to that eigenvalue are precisely the vectors in the sub- 
space of U spanned by uw, . . . , us, and so the eigenvalue in question has multi- 
plicity &. 


THEOREM 5.4 Let — U be an endomorphism of a vector space U of dimension n 
over a field K, and let ps, pr, . . . , py be distinct eigenvalues of T (in the field K). 
Let Vi, Vo, « « » 5 ¥y be nonzero eigenvectors belonging fo Pr, po, . - . , Pr respectively. 
Then those vectors are linearly independent. 
Proof. We show that vw, w, ..., ¥, are linearly independent for 
k =1,2,...,7, by induction. The assertion that v1, ¥2, ... , vs are 
independent is true for = 1, since v, # 0 by assumption. Suppose now 
that the assertion is true for some k <r, but false for k + 1. Then there 
are sealars c1, C2, . . . , ¢-41, not all zero, such that cv) + eve + +--+ + 
cx¥igi = 0. Applying T to this equation and keeping in mind that 
T(v,) = piv, we get camva + Cope + ++ + + CraaPrai¥eur = 0. Multi- 
ply the first equation by p,1; and subtract the result from the second equa- 
tion. We get 


& 
522 ee. — puadye = 0 
feat 


Now p; — divi #0 (3 = 1, . . - , &), because the p’s are assumed distinct. 
Furthermore ¢), ¢z, - . - , ¢ cannot all be zero, for otherwise our first 
equation above would reduce to ¢i4i¥ia1 = 0, with ci. * 0, and that is 
impossible because of our assumption that v,,; #0. Therefore (5.2) 


shows that vi, . . . , ¥; are linearly dependent, a contradiction. Hence 
the assumption that v,, ... , v1 are linearly dependent is untenable. 
QED. 


Theorem 5.5 Lei T:U — U be an endomorphism of an n-dimensional vector space 
over a field K, and suppose that T has n distinc eigenvalues py, po,» .  , Pn (in K)- 
If vay Vm... 4 Vn are nonzero eigenvectors of T belonging f0 Diy Day» « y Pas TE 


Eigenvalues and eigenvectors 829 


spectively, then they form a base for U; and the matrix of T relative to that base is the 
diagonal matrix having py, . . . , Px as tts diagonal elements. 

This is an immediate corollary of Theorem 5.4. 

If the characteristic polynomial of a linear mapping has repeated roots, as in 
Example 1 above, then the situation is not so simple. A careful analysis is given 
in Chap. 13. Nevertheless, the last assertion of Theorem 5.5 is valid for any endo- 
morphism T provided that v1, vo, . . . , ¥, is a base of eigenvectors of T. 


REMARK 3. The problem of computing eigenvectors is in general rather tedious. 
It is first of all necessary to determine the eigenvalues, i.e., the roots of the char- 
acteristic polynomial of the given mapping T:U— U. If p is an eigenvalue of 
T, then the eigenvectors that belong to it constitute the kernel of the mapping 
pl — T. That kernel can be computed by the method developed in Sec. 9, Chap. 9, 
for as we have seen in connection with (9.30), Chap. 9, if the matrix of pl — T 
relative to some base pair is in diagonal form, then the kernel and image can be 
read off immediately. But we point out that the reduction process of Sec. 9, 
Chap. 9, involves at each step a pair of bases, one for the first space and another 
for the second space. Here the two spaces are the same, but nevertheless it is 
necessary to consider the matrix of pl — T relative to a pair of possibly different 
bases. In general it is not possible to find a single base for which the matrix of 
an endomorphism is a diagonal matrix. This question will be taken up in Chap. 13. 


THEOREM 5.6 Let T be an endomorphism of a finite-dimensional vector space U, and 
let W be a subspace (of positive dimension) which is mapped into itself by T. Let T’ 
denote the mapping T restricted to elements of W. Then the characteristic polynomial of 
T’ divides the characteristic polynomial of . 

Proof. Let dim U =n and dim W =r. Then we can find a base 
uy,» . , ua of U such that the first r elements form a base for W (Theo- 
rem 9.8, Chap. 8). If a is the matrix of T relative to the base w, ..., 
u,, then a must have the form 


*-(0 2) 


where a’ is the matrix of T' relative tow, ... ,u,of Wy. For T(u,) is in 
W and isequal to T’(u,), for? = 1, ...,7. Hence T(u,) fori =1,..., 
r can be expressed uniquely as a linear combination of wm, ...,u, If 


now g(#) and g'(f) denote the characteristic polynomials of T and T’, 
respectively, then (putting n ~r = s) 


a,~a'  -b 
ell) = det , — a) = act ( . .) 


= det (i, — a’) - det (i, — ©) 
e') - det (fh ~ 0) 


by Theorem 3.4. Q.E.D. 


330 Determinants Ch. 11, Sec. 5 


corotuary If p is @ k-fold root of the characteristic polynomial of T, then the multi- 
plicity of p as eigenvalue cannot exceed k. 
Proof. Let W denote the subspace of U consisting of all the eigen- 
vectors of T belonging to p. Then T maps W into itself, clearly. Let T’ 
denote T restricted to W. Ifw, . . . , w,is any base for W, then T’(w;) = 
pw,, and so the corresponding matrix of T’ is pl. Hence the characteristic 
polynomial of T’ is (¢ ~ p)*, and this must divide the characteristic poly- 
nomial of T. Henceh <k. Q.E.D. 


The following theorem is closely related to Theorem 5.6: 


THEOREM 5.7 Lei T be an endomorphism of a finite-dimensional vector space U. 


Suppose that U is the direct sumt of subspaces Wi, Wa, . . . , Ws of positive dimen- 
ston, and suppose that T maps each W; into itself G = 1,... ,k). Let T; denote 
the mapping T restricted to the elements of W;, and let y;(E) be the characteristic poly- 
nomial of T;, e(t) that of T. Then g(t) = vill) «+ + v(t). 

Proof. Let By B, ..., By be bases for Wi, Wo... , Wi. Our 
assumption merely means that all the vectors in By, Bo, ... , By, taken 
together, form a base B for U. Now let a be the matrix of T relative to B, 
and let a.) be the matrix of T; relative to the base B; (j =1,...,%). 


Since T applied to the vectors in B; is the same as T;, it follows at once that 


aw 


am 


aa 


where on the right is indicated a matrix with zero entries except for the 
square matrices ay, ..., am arranged along the diagonal. Then 
iI ~ a is a matrix of the same form, and by repeated applications of 
Theorem 3.3 we find that 


det ((T — a} = det (¢1 — am) + + + det ¢1 — am) 


where I always stands for the unit matrix of the proper size. The last 
equation is what we wanted to prove. Q.E.D. 


tHEoRrem 5.8 Let S and T be two endomorphisms of a vector space U, and let p be an 

eigenvalue of T. Let W he the subspace consisting of all eigenvectors of T belonging to 

p. If Sand T commute, then S maps W into ilself. Furthermore, S maps the kernel 
and image of T into themselves. 

Proof. Consider first Ker T: If T(x) = 0, then Te S(x) = So T(x) = 

0, showing that S(x) is in Ker T. Hence S maps Ker T into itseli. Now 


+ See Definition 5.2, Chap. 8. 


Eigenvalues and eigenvectors 331 


W is the kernel of pl — T. S commutes with this operator if it commutes 
with T, and therefore, by what was just shown, S must map W into 
itself. Finally, if y is in Im T, then y = T(x) for some x. But then 
S(y) = ST(x) = T(S(x)), showing that S(y) is also in Im T, QED, 


The following theorem connects the eigenvalues of a mapping T with those of 
various operators that can be formed from T. 


THEOREM 5.9 Let T be an endomorphism of an n-dimensional vector space U over a 
field K, and suppose that the characteristic polynomial g(t) has all its roots in K. 
That is, suppose thalt p(f) = (E — pi) - - > @ — pa), where py, . .. , pa are in K, 
not necessarily distinct. Let f(t) be any polynomial with coefficients in K, and let 
gst) be the characteristic polynomial of the operator f(T). Then e,(t) also has all tts 
roots in K, and the set of roots of ,(t) is the same as the set of elements f(p:), . . . 5 
f (pn). Furthermore, if x is an eigenvector of T belonging to p;, then x is also an eigen- 
vector of f(T) belonging to f(p;). Finally, if det T = 0, then the characteristic poly- 
nomial of T-! is (@ — pit) - - ( — pe). 
Proof. If T(x) = px, then T(x) = T(T(x)) = Tipx) = pT(x) = px. 
By a simple induetion, it is easy to see that T*(x) = p*x for any positive 
integer k; and this clearly holds also for x = 0, since T? = I. 
if now fi) =e tet t+ ++ ted, then f(Tx) = (ol taT + 
+6 T)(x) = cx + aT) +++ + eT(x) = cox + px + 
- + ep’x = f(p) +x, still assuming that T(x) = px. Therefore, for 
each eigenvalue p of T and eigenvector x belonging to it, the sealar f(p) is 
an eigenvalue of /(T) and x is an eigenvector of /(T) belonging to it. 
Now write S = f(T), and let ¢,(¢) be its characteristic polynomial. 
Let g(f) be a monic irreducible factor of g;(f). We want to show that gif) 
has degree 1. To do so, let W be the kernel of the operator g(S). That is, 
W consists of all vectors y such that g(S)(y) = 0. Now T clearly com- 
mutes with S, and therefore it commutes with any polynomial in §, in 
particular with g(S). Therefore T maps W into itself, by Theorem 5.8. 
Let T’ denote the mapping T restricted to W, and let ¢’ (t) be its character- 
istie polynomial. From the corollary of Theorem 4.6 (applied to any 
matrix representing S), it follows that dim W > 0, and so deg y'(f) > 0. 
Furthermore, ¢'(é) divides y(t), by Theorem 5.6. Hence, one of the roots 
of y(t), say ~, must also be a root of y’{t). There must be a nonzero 
eigenvector y of T’ belonging to p:, by Theorem 5.3. That is, y 4 0 is in 
W and T(y} = py. But 9(S)(y) = 0, or g(f(T))(y) = 0. By what was 
shown above, applied to the polynomial ¢(f(f)), we have g(f(T)}(y) = 
g(f(p))y. Hence, g(f(p1)) = 0, and so f(a) is a root of the polynomial 
g(t). Therefore g(é) = ¢ —f(p:), since g(é) is monic and irreducible. 


{ See Sec, 4, Chap. 6. Recall that yi) is a monic polynomial of degree n. 


332 Determinants Ch. 11, See. 5 


Hence, the monic irreducible factors of ¢;() in K(t) must all occur among 
the polynomials i — f(p.), . . . , £ — f(px)- 

Finally, the assertion concerning the characteristic polynomial of T—' 
follows at once from Exercise 8, Sec. 4. Thus, if a is the matrix of T 
relative to some base, then a~' is the matrix of T-', and 


det (£1 — a!) = det (wy ‘(a- DD) = det (ta7!) - det (a - 7) 


= (-1)"t"- (det a)-!- AG) QED. 


EXERCISES 


1. Find the eigenvalues and eigenvectors of the mapping Q@? > Q* that sends an 
arbitrary vector x into ax, where 


it 6 
a= 
—20 -11 
2. Find the eigenvalues and eigenvectors of the mapping Q' — Q‘ that sends an 
arbitrary vector x into ax, where 
0 -1 1 -1 
0 -1 o 66 
@ 0 -1 0 
1-1 1-2, 
8. Let P, denote the vector space of polynomials of degree <n in an indeter- 
minate y with coefficients in the real field. Let D be the derivative mapping of P, 
defined by 


n=l acl 
DET ayt = SS kaw 
(= ey ) > ny 
Let T be the endomorphism D - (1 ~ 4)D of P, (see Exercise 4, Sec. 4, Chap. 9). 
Compute the eigenvalues of T and its eigenvectors of degrees up through 5 (as- 
suming x > 5). (These eigenvectors—usually called eigenfunctions in this case~ 
are the famous Legendre polynomials, apart from constant factors. They are of 
importance in many applications.) 

4. Find the characteristie polynomial of 1 — 2a + a® ~ al”, a being as in 
Exercise 2. 

5. What are the eigenvalues and eigenvectors of I — D? + 4D"~? in Exercise 3? 

6. Let T: U — U be an endomorphism of a finite-dimensional vector space, 


and let ~, Do... , pe be distinet eigenvalues of T. Let W; be the subspace of 
all the eigenvectors belonging to p; (j = 1, . . . , #), and let B; denote a base for 
W;. Prove that all the vectors in By, Bz, ... , By, taken jointly, are linearly 


independent. (Hint: The proof is similar to that of Theorem 5.4.} 


Determinants as volumes 833 


7. Let T be an endomorphism of a space of odd dimension over the real field. 
Prove that T has at least one nonzero eigenvector. 

8. Give proofs for the assertions of Remark 1. 

9. Let a = (a’;) be a real 2 X 2 matrix. Show that the following conditions 
on a are equivalent: 


(a) ai, >0 


; 
(6) a has the eigenvalue 1 with eigenvector (1,1, ... . 1). 
A matrix satisfying either of these conditions is called stochastic. 


6. Determinants as volumes 


Theorem 3.1 shows that the determinant of a square matrix gives some measure 
of the linear dependence or independence of its column (or row) vectors. Here we 
shall show that, in the case where the field of scalars is the real field R, the deter- 
minant of a matrix has a natural interpretation in terms of the volume of a certain 
region. 

We deal with a euclidean space F and we denote by T(E) the associated set of 
mappings (translations) of the set E. Addition of two elements of T(E) is just. 
the operation of composition of those two translations. If p and g are two points 
of E, then pq denotes the (unique) translation that sends p into g. T(E) is supposed 
to be a euclidean vector space, meaning that there is assigned to every translation 
t a non-negative real number |t|, the length of t, subject to the requirements of 
Definition 11.1, Chap. 8. Namely, it is required that there be a base, . . . , Un 
in T(E) such that if a translation x has components x', x*, ... , x” relative to 
that base, then 


So wy 


ist 


aL 


If y is another translation, with components y', . . . , y”, then the inner product 
of x and y is the number 


a2 (xy) = SO wy = Jolin + yl — [xP — ly?) 
it 

A base for which (6.1) holds is called orthonormal. An orthonormal base is char- 
acterized by 
a mn wy alo Hews 

ven N\A itias 
A hyperplane H in E consists of all the points obtained by applying all the transla- 
tions in an (n — 1)-dimensional subspace of T(E) to some given point of E. 


834 Determinants Ch. 11, See. 6 


Hence, if the given point is p, and if wm, ..., wn. span the (mn — 1)-dimen- 
sional subspace, then H consists of all the points p such that 


oe pop = swt tat We 


where s!, . . . , s""! are arbitrary real numbers. For this equation says precisely 
that p is the result of applying the translation indicated on the right to po. 


Let us now consider a two-dimensional euclidean space Zz, and let po, p1, py be 
three noncollinear points in it. If for brevity we write 


— == 
Wi = pop Wi = pope 


then our assumption that the points are not collinear means that w, and w: are 
linearly independent vectors, hence span the vector space of translations T(E»). 
Let P denote the set of all points p such that 


8s pop = sl + 8°We O<sS10<58<1) 


That is, P consists of all points p obtained by applying the translation s'w: + s*ws 
to po, the numbers s! and s? being confined to the interval from 0 to 1. We shall 
call P the parallelogram determined by po, pis Ds, OF bY Poy Wi, Wet 


It is not difficult to convince oneself that P in fact corresponds precisely to one's 
intuitive notion of a parallelogram. It is illustrated in Fig. 1. The four vertices 
are the points obtained by putting 0, 1 for s! and s* in (6.5). For example, put- 
ting s! = 1, «? = 0 we get pop = v1 = pops, whence p = py. The vertex ps is 
the point obtained by putting s' = 1 and s? = 1 in (6.5). Thus pop? = pop + 
pops The edges of P consist of the points obtained by putting s! or 8? equal 
to 0 or 1 in (6.5). For example, the edge joining p» and p; is obtained by setting 
s? = 1 and therefore consists of all points p such that 


Pop = sw + We @<s<l) 


where we have written simply s for the variable s% Since pop — ws = pip — 
pot = Pe, our equation becomes 


6.6 pep = sw O<s<t 


That is, the edge in question consists of all points p obtained by applying the 
translations sw; (0 < s <1) tom. By definition, (6.6) determines a segment of 
a line—the unique line through p: and ps. [Equation (6.6) is a special case of 
(6.4), except for the restriction on s.] In a similar way one sees that the other 
three edges of P are segments of lines. 

We recall that a euclidean coordinate system in E, is obtained by choosing an 
orthonormal base wy, u in the space of translations, along with a point in Ey as 


+ P depends in general upon the order in which the three points appear. 


Determinants as volumes 335 


Figure 1 


origin. Let us take pp as origin, If p is any point, then its coordinates are the 
unique real numbers <', 2° such that 


a7 pp = wy + 2 


Let the coordinates of p: and p, be a', a, and a’, a%, respectively. That is, (6.7) 
becomes 


68 Wy = uy We = el 


The coordinates of p1, p2 are the same as the components of w,, we (relative to the 
given orthonormal base). Equation (6.5) becomes 


eta, + xm = 8h (aly + aye) + 8? (ate + a?et) 
Since m1, uy are linearly independent, we can equate coefficients here, getting 


co TM AS es cose sh 
a? = sla + sa% 
This is the coordinate version of the vector equation (6.5}. 

Qur aim is to see how to compute the area of P, with the purpose in view of 
carrying our results over into higher dimensional spaces. Now the first problem 
that comes up is that of defining area to begin with. It is outside our province to 
go into that question in detail. Indeed, one of the problems solved by caleulus is 
precisely that of giving a workable definition of area for rather arbitrary plane 
figures (and, more generally, of volurnes of figures in spaces of higher dimension). 
Since we shall deal here only with parallelograms and analogous figures of higher 
dimension, we shall give a definition of area (or volume) applicable to them, and 
we shall try to show at the same time that our definition meets reasonable require- 
ments consistent with intuitive prejudices. 

We claim that the quantity 


al 
eu | det ( “*) | (absolute value of the determinant!) 
a, a 


336 Determinants Ch. 11, See. 6 
Figure 2 Pa Pe Ps Ps 


suaueubsedauwauen 


Do Ps 


formed from the coordinate vectors a1, a2 of W1, W has all the qualifications en- 
titling it to be called the area of P. To see this, consider Fig. 2. 

Here we have associated with P another parallelogram P’, with vertices po, p1, 
py ps, such that the angles between its edges are right angles, that is, P’ is a rec- 
tangle. As the diagram shows, P’ is obtained from P by cutting off a triangle on 
the right and pasting it on the left of the figure.t It is also suggestive to think of 
P’ as obtained from P by a “shearing motion,” causing the upper edge of P to 
slide along itself to the left. It is therefore reasonable to require that our definition 
of area give the same reault for both P and P’. 

Now if the point p; has coordinates b! and b?, then the quantity corresponding 
to (6.10) for P’ is 


1 Bt 
611 | det (: ie 
a, BP 

We calculate this as follows: Since P’ is by assumption a rectangle, the vectors 

pop, and pops are orthogonal. Their components relative to the orthonormal base 


un, w are the elements in the two columns a, andb of (6.11), For the inner product 
of the two vectors we have then 


ab! + ab? = 0 
Hence to compute (6.11) we have only to use Theorem 6.1, and the result is 
612 [eaty? + (a%)*8 - (4)? + (6°77 = [poral « [pops 


by (11.12), Chap. 8. In other words, (6.11) is simply the product of the two sides 
of the rectangle P’, and we naturally define that quantity to be the area of P’. 

As pointed out above, it is intuitively clear that P and P’ should have the same 
area, and therefore we take (6.11) as the area of the original parallelogram P. 
Our last step consists of showing that (6.10) and (6.11) are equal. For that purpose 


t That is not correet if p, and p; fall on the same side of ps. In this case similar considera 
tions still apply. 


Determinants as volumes 837 


we must determine the relationship of the column veetor b in (6.11) to the columns 
a, a2 in (6.10). Now ps is on the line through p: and p;. All points on that line 
are given by (6.6), with the restriction on s omitted, and so we must have pops = 
sw, for some value of s. Since paps = pops — Pop2 we have 


Pops = sw, + We 
(We do not need to know what s is here.) The corresponding equation for the 
component vectors of the three vectors in question is 


b = sa + We 


‘Therefore (6.10) and (6.11) are equal, by Proposition 2.2. 

In particular, it follows that (6.10) does not depend upon the choice of ortho- 
normal base in T(E), since the value of (6.11) is independent of that choice, by 
(6.12). If the three points po, p1, pe are collinear, then w, and w.; hence also a, and 
a» must be linearly dependent. The parallelogram P collapses, and (6.10) vanishes. 


We now give the theorem used in getting from (6.11) to (6.12): 


THEOREM 6.1 Lei a = (a’,) be an n X n matrix with coefficients in a field K, and 
suppose that 


6.13 > aa, =0 ifixek 


then 


(det a)? = [J (ay? +--+ + ay 


pot 


Proof. Putb = ‘a-a, Then, by definition, 


w= Yat) a1 = SY ator 

mA i 
and so b is a diagonal matrix. We have det b = det (‘a-a) = (det ‘a) - 
(det a) = (det a)*, and det b = b',-% » + - b",, which is what we wanted 
to prove. 


We shall now show briefly how these considerations allow us to define volumes 
of analogous figures in euclidean spaces of any number of dimensions. 

To do so, let po, m1, .... Pa be 2 +1 points in an x-dimensijonal euclidean 
space E,. For brevity of notation, let us write w; for the translation veetor pop: in 
T(E,). Following (6.5) we define the parallelotope P determined by the (n + 1)- 
tuple po, Pi. « - > Pn (Or bY Bo Wi, » . - » Wx) to be the set of all points p such that 


616 pep = tw, + Sw bs to, (0<s,<1lforj=t,...,n) 


338 Determinants Ch. 11, See. 6 


Equation (6.14) sets up 2 one-to-one correspondence between the points of P 
and the points of the unit cube in R" consisting of all n-tuples (s!, . . . , 8) for 
which 0 < si <1(j =1,..., ), for each index j = 1, 2,..., ” the point 
set P has two faces, one consisting of all points p for which s/ in (6.14) is equal to 
zero, the other consisting of all » for which s’ ig equal to 1. The 2" faces of P 
correspond in an obvious way to the four edges of the parallelogram considered 
above. Each face lies in a hyperplane of E,, as is easily verified. The vertices of 
P are the 2* points obtained by putting all the s/ in (6,14) equal (independently) 
todorl. The given points po, p, . - . , P, are clearly among the vertices. P is 
called rectangular if the n vectors w;, . . . , W. in (6.14) are mutually orthogonal. 
If they are moreover all of the same length, then P is called an x cube. 

Now let uy, uw, . . . , u, be an orthonormal base for the vector space of transla- 
tions T(E,). Let a; be the corresponding vectort of w;. That is, 


6.15 wi = at G@=1,...,%") 
We now define the volume of the parallelotope P to the number 
6.16 vol P = [det al 


where a = (a1, .. . , a,) is the x X n matrix built from the vectors ai. 

First of all, let us verify that (6.16) does not depend on the particular choice of 
orthonormal base. If ¥, . . . , ¥: is a second orthonormal base, and ifb, . . . , bs 
are the component vectors of w, . . . , W, relative to it, then we have 


6.17 = bv, @e=1,...,2) 


Thus, 


just as in (6.15). Now let ¢ = (ci;) be the matrix from {us} to { 
6.18 v= cin; @=t...,n) 
Then from (6.15), (6.16), (6.17) there follows at once a*; = ¢t,b’,, or a = ch, 


where b = (bi, . . . , ba) is the matrix whose columns are the vectors bi, . - - , by. 
Since vi, . . . , ¥» is an orthonormal base, we have 


Ww vw = 1° ifi#k 
eM iftek 


by (6.3), and so. 


Seek _ [0 ifiwk 
a lo ifiek 
by (6.2). Therefore, by Theorem 6.1, (det ¢)? = 1. Then det a = det (eb) = 
(det ¢)- (det b) = det b. This shows that (6.16) does not depend upon the 
choice of orthonormal base. 
t The numbers constituting a; are the corresponding coordinates of the point p, if we 
take po as origin. 


Determinants as volumes 339 


Secondly, let us show that (6.16) gives the ‘‘correct”’ result for a rectangular 
parallelotope. If P is rectangular, then, by definition, (wi, wi) = 0 if i # k. 
That is, 


ai, = io fink 
x Me Vile ifi =k 


by (6.2). Hence, by Theorem 6.1, we have in this case 


det a = + [wi + [We] + + + [Wel 


= + |popal + popel - + + [popal 


Therefore, (6.16) gives the product of the dimensions of P as the volume. 
Finally, let us observe that, if P is deformed by a shearing motion, its volume 
as defined by (6.16) does not change. The argument here depends upon a mathe- 
matical interpretation of ‘shearing motion,” and we shall appeal to the situation 
of Fig. 2. Thus, suppose that w, is replaced by a new vector wi, and let P’ denote 
the new parallelotope (6.14). We shall say that P’ is obtained from P by a shear 
if the hyperplanes H and H’ containing the faces of P and P’ determined by s* = 1 
are identical. Suppose then that p is a point of H. That is, by (6.14), we have 


pop = sw + 0+ +8" Wey + We 
Now H and H’ are identical if and only if p is also a point of H’, and if that is so, 
there are numbers f', . . . , ¢'-! such that 

Pop = tw too $e + wh 


Equating these two expressions we find that 


who Ww, belw toe $e wa 
for certain numbers c!, .. . , e"—'. If a, is the component vector of wi, relative 
to the base us, . . . , uy, then we have 
6.19 a, =a, tela tices feta 
Now the volume of P’, given by (6.16), is |det (a, . . . , an, a4)|, and from (6.19) 
it follows that this is equal to the volume of P. In the same way we can treat a 
shear involving any of the other vectors wi, ..., Wa. It is not hard to see 


that P can be deformed into a rectangular parallelotope by a series of appropriate 
shears. By what was just shown, its volume, as defined by (6.16), is thereby 
unaltered. 


EXERCISES. 
1. Use (6.10) to show that the area of a parallelogram determined by po, pi, D2 is 
equal to |pops| » |zop2| - sin @, where @ is the angle pipope. 
2. Let po, pt Ps Ps be points of a euclidean 3-space, and let their coordinates 


340 Determinants Ch. 11, See. 6 


relative to some euclidean coordinate system be (2, 0, 3), (0, 1, 2), (—1, —2, 8), 
(4, 1, 2), respectively. Compute the volume of the parallelotope P determined by 
Po Pi, Pz, Ps. Compute the area of the face determined by po, pi, p2. Which of the 
following points are in P: (14, 0,2), (5,4, —2), (—8, —1, 2)? Give the coordinates 
of the vertices of P. 

3. Let (2,1, —1, 0), (4, 1, 0, 3), (2, —2, 0, 0), (0, -1, 2, 3) be the coordinates of 
four points in E, relative to some euclidean coordinate system. Compute the 
volume of the three-dimensional parallelotope that they determine. 

4, Let P be a parallelotope in E,, Prove that the volume of P is equal to the 
product of the volume of a face of P times the perpendicular distance to the hyper- 
plane containing the opposite face. 


12 


Rings of operators and differential 
equations 


1. Introduction 


By viewing the differentiation operator d/dz of caleulus as an endomorphism of a 
certain vector space we shall be able to apply some of the results of earlier chapters 
to prove some important theorems about differential equations. The necessary 
definitioris are given in Sec. 4, and no detailed knowledge of calculus is required. 
In Sees. 2 and 3 we recall briefly some definitions and simple facts concerning rings 
and homomorphisms—in particular, homomorphisms resulting from the substitu- 
tion of operators in polynomials. The reader is referred to Chaps. 1, 2, 6, and 10 
for a more elaborate account. 


2, Rings and homomorphisms 


First of all, a ring A is a set of elements equipped with two binary operations, 
which we call sum and product, satisfying the following conditions: 
(1) A with the sum operation is an abelian group. 
(2) Phe product operation is associative. 
(3) A contains a unit element (that is, an identity element for the product opera- 
tion).+ 
(4) The sumand product operations satisfy the distributive law. That is, (a + )e 
= ae + be and cla +b) = ca + cb for any clements a, b, cof A. 

As usual, we denote the sum of two elements by a + 6, their product by ab. 
The first condition requires A to contain a zero element 0 (that is, the identity 
element for the + operation), and A must contain the negative —a of any element 
a (—a is the inverse of a for the + operation). We have 0-a = a-0 = 0 for 
any element of A. 

Ife is the unit element of A, necessarily unique, then ea = ae = a for any ain A. 
An element a is called invertible if there is an element a’ in A such that aa’ = aa = 
e. If that is so, then a’ is unique and is called the inverse of a (relative to the 
product operation). It is denoted by a-'. Since a must then be the inverse of «’, 


t This condition is omitted in some texts. 


342 Rings of operators and differential equations Ch. 12, Sec. 2 


that is of a-!, we have (a—!)-! = a. The unit element ¢ is invertible, and e~! = e. 
If a, b are both invertible, then so is ab, and (ab)-'! = 6—a—'._ The zero element 0 
of A is not invertible unless it is the only element in A. A is said to be a commuta- 
Hoe ring if the product operation is commutative (ab = ba for any two elements 
of A). We recall that a field is a commutative ring in which every nonzero element 
is invertible. 

For any element } of a ring A and any positive integer m, the symbol mb denotes 
the element obtained by adding 6 to itself m times. Then (—m)b is defined to be 
the element m-(—}). For the integer 0 we put 0-6 = 0 = zero element of A. 

In a similar way, 6" denotes the element obtained by forming the product. of b 
with itself m times, m being a positive integer. We put 6° = e = unit element of 
A. If} is invertible, then we define 6—™ by the equation b-™ = (b—!)". The usual 
rules for exponents hold. We refer to Sec. 7, Chap. 2, for a precise account of the 
operations symbolized by mb and b™. 


EXAMPLE 1 For our purposes we are particularly concerned with the following 
example, studied in Sec. 4, Chap. 9. Let V be a vector space over a field K, and 
let EZ denote the set of all linear mappings of V to itself. We have defined two 
binary operations in # as follows: Let S and T be elements of EH. ThenS + T 
and S$» T denote the linear mappings of V defined by 


(8 + T)(x) = S(x) + Tex) 


24 (Se Tx) = S(T(X)) 


for any vector xin V. E with these two operations is a ring, as is easily verified. 
Its zero element is the mapping 0 that sends every vector of V into the zero vector. 
The unit element is the identity mapping I. A third operation was defined in 
Chap. 9. Namely, if ¢ is any element of the field K, and if T is any element of E, 
then cT denotes the linear mapping of V defined by 

22 (eT){x) = ¢- T(x) 

for any vector xin V. E with this operation and the first operation of (2.1) is a 
vector space over K. The set E with all three operations is an example of what 
we have called a K-algebra, Eis called the algebra (or ring) of endomorphisms of V 
and is sometimes denoted by End(V). Since £ is a ring, the exponent notation 
discussed above is applicable. If T is any element of E and m a positive integer, 
then it follows from (2.1) that T” is the mapping of V obtained by applying T 
m times in succession. 


ExaMPLE2 K being a field, let K", be the set of all n X n matrices with coefficients 
in K. Then K*, is a ring with the sum and product operations defined in Sec. 6, 
Chap. 9, and is a K-algebra with the third operation of scalar multiplication. 

A ring B is called a subring of a ring A if the elements of B are all in A and if 
the sum and product operations of A, applied to elements of B, are the same as 
the sum and product operations of B. 


Rings and homomorphisms 343 


If A’ is a subset of a ring A and if A’ contains the unit element ¢ of A, along with 
the sum, difference, and product of any two of its elements, then A’ becomes a 
subring of A if the sum and product of two elements of A’ are defined to be the 
same asin A, For example, A’ must then contain e — e = 0 and must also contain 
0 -—a = —afor anyain A’, Condition (1) above is then easy to verify, and the 
other three conditions are trivial to check. 


EXAMPLE 3 Let End(V) be the ring of endomorphisms of a vector space V over 
a field K, and let K’ consist of all the elements el, ¢ being an arbitrary scalar (the 
mappings ¢l of V to itself are called scalar mappings). Then K’ is a subring of 
End(V). Furthermore, the mapping K — K’ defined by c — cl is one-to-one, 
provided that dim V ~ 0. K’ is then a field, and the mapping just defined is an 
isomorphism from K to K'. 


exampte 4 K*, being as in Example 2 above, let K’ consist of all the matrices 
el, where ¢ is in K and Lis the n X unit matrix (the matrices cl are called scalar 
matrices). Then K” is a subring of K*,. It is, in fact, a field, and the mapping 
K — K” defined by ¢ — el is an isomorphism. 


EXERCISES 

1. Referring to Example 3, show that every element of End (V) commutes with 
every element of K’. Do the same for K", and K” of Example 4. 

2. An element « of a ring A is called nilpotent if a” = 0 for some positive integer 
m. Suppose that A is commutative, and let N be the set of all nilpotent elements 
in A, Prove that the sum, difference, and produet of two elements of N is again 
in N. 

3. Let a be a nilpotent element of a ring A, and let e be the unit element. Prove 
that e — ais invertible. (Hint: Show thate +a+a?+ --+- + a’isits inverse 
if kis large enough.] 

4, Let A be a ring, and let 5S be a subset containing the unit element. Let B 
consist of all elements of A which can be expressed as (finite) sums of products of 
the type 

S182 + 
where s, . . . , 8, denote arbitrary elements of S (and + an arbitrary integer). 
Show that B js a subring of A (called the subring generated by S). Show that if 
an element a of A commutes with every element s of S (that is, sa = as), then a 
commutes with every element of B. 

5. Let a, b be elements of a ring A such that ab = ba = 0. Prove that (a + 6)! = 
af + b! for any integer k > 0 

6. An element a of a ring A is called idempotent if a? =a. Let a, b be two 
idempotent elements such that ab = 6a. Prove that the subring of A generated 
by e, a, b consists of all elements mee + ma + mb + mab, where mo, . . . , ms 
are integers. 


344 Rings of operators and differential equations Ch. 12, Sec. 3 
3. Homomorphisms of rings 


A homomorphism of two rings A, A’ is a mapping /: A — A’ satisfying the follow- 
ing conditions: 


f(a +b) = fla) + fb) 


flab) = fa) - fd) 
for any two elements a, 6 of A. From the first equation it follows that (0) = 0 
and f(—a) = —f(a) for any a in A (see Sec. 2, Chap. 2). The homomorphism f 


is called unitary if it sends the unit element e of A into the unit element e’ of A’, 
f(e) =e’. If that is so, then f maps A onto a subring of A’, called the image of f. 
For then e’ is in the image, by assumption, and if /(a), f(b) are any two elements 
in the image, then f(a) + f(b) and f(a) - /(b) are also in the image, being equal to 
fla + 6) and f(ab), respectively. We shall encounter only unitary homomorphisms 
in this chapter. 

By repeated applications of (3.1) it is easy to verify that if an element a of A is 
expressed in any way as a (finite) combination 


32 a=@--- ats db) t--- 
of sums (or differences) and products, then 
2a fa) = (fla) - - fad) + (f(b) © + + fb) + 


A homomorphism f is called an isomorphism if it is one-to-one. If that is so, 
then the inverse mapping /~! is also an isomorphism. 


EXAMPLE 1 As already mentioned in Examples 3 and 4 of the preceding section, 
the mappings K — K’‘ and K — K” are isomorphisms. 


exampre 2 Let V be an n-dimensional vector space over a field K (n > 0), and 
let B be a base for V. Then the mapping End (V) — K", that sends an endomor- 
phism T of V into its matrix relative to the base B is an isomorphism (Theorem 6.4, 
Chap. 9). 

The applications we have in view in this chapter center round the following 
situation: Let g(é) be a polynomial in an indeterminate ¢ with coefficients in a 
field K, and let T: V — V be a linear mapping of a vector space V over K. If 
g(t) = by + OE + + + - +,", we recall that g(T) denotes the linear mapping 


34 aT) = bE +7 + --- +5,7 


of V. The operation g(¢) > 9(T) defined in this way is a unitary ring-homomor- 
phism K[é] > End (V) (cf, Theorem 4.4, Chap. 11). The verification is straight- 
forward. If a denotes an » X n matrix with coefficients in K, then g(a) denotes 
the n X » matrix 

= bl + bat +--+ +b," 


where I here is the 2 X x unit matrix. The operation g(t) — g(a) defined in this 


35 ga, 


Homomorphisms of rings 345 


way is a unitary ring-homomorphism K[¢] — K*,, and again the verification is 
straightforward. Homomorphisms obtained in this way by substituting a linear 
transformation or a matrix for a variable in a polynomial will be called substitution 
homomorphisms. Examples of them have occurred several times in earlier chap- 
ters. 

It will be convenient to denote the substitution homomorphisms defined by 
(8.4) and (3.5) by Sr and S,, respectively. 

It was pointed out in Sec. 4, Chap. 11, that the substitution homomorphism 
Sy affords a simple method for computing inverses. We recall the method here. 

T being an endomorphism of a vector space V over a field K, suppose that 
g(T) = 0 for some polynomial g(t) in K{é]. For example, g(t) can be taken as the 
characteristic polynomial of T, if V is of finite dimension.t If /(d) is a polynomial 
which is relatively prime to g(t), then f(T) has an incerse. For by Theorem 3.4, 
Chap. 6, there exist polynomials a(t) and (é) in Kl] such that 


3.6 ahi) + dG) = 1 


Applying the substitution homomorphism Sy to this equation we get [ef. (3.2) 
and (3.3)] 


37 a(T)f(T) + o(T)g(T) =1 
Since g(T) = 0, by assumption, (3.7) becomes 


38 alT)i( 


T 


Now a(f)/() = (a(t), and St applied to this equation gives a(T)/(T) = /(T)a(T), 
whence /(T)a(T) = 1. Therefore a(T) is indeed the inverse of /(T). The poly- 
nomial a(é) can be calculated by the methods of Sec. 3, Chap. 6. It is clear that a 
similar discussion holds for matrices in place of linear operators. 


ExampLe 3 T being an endomorphism of a vector space V, suppose that Tr = 0 
for some integer r > 0. Here we can take g(t) =’. From the identity 


Q-O0 464+ --- 404) =1-6 


in K{l] we can take f(t) =1—4, a) =1+¢4--- +47, d@) =1, in the 
notation of (3.7). Therefore, 


G-Tsl4T4+ 2. 47 
(This is a special case of Exercise 3 of the preceding section.) 


exampLes As in the preceding, let T’ = 0, and let ¢ be any nonzero scalar. From 
the identity 


nn + S)a1-8 


+ By the Cayley-Hamilton theorem, see Sec. 5, Chap. 11. 


346 Rings of operators and differential equations Ch. 12, See. ¥ 


we conclude, just as in the foregoing, that 


am =Gtine... + 
¢ @ 


Similarly, 


ExampLe § Let V be an n-dimensional vector space over a field K, and let m, 
- 4, be a base, Let T be the linear mapping from V to itself such that 


Fi) = wan G@=2,...,n) 
Tia) =0 


The matrix of T relative to the given base has all entries zero except for a diagonal 
sequence of 1's above the main diagonal. The characteristic polynomial of T is 
therefore {*. Hence, as in Example 4 above, 


1 


oi 


po 


t - 1 

(1-7) =l¢ir4--- + 
ExampLe 6 Let T be as in Example 4 above. Then (T)!"1+1 = 0, where [r/2] 
denotes the greatest integer not exceeding r/2. Hence, by the same reasoning, 

etm Bae tye os bare) 
EXAMPLE 7 Let T be an endomorphism of a vector space V, and let its charac- 
teristic polynomial be 

ef) = KF - 1) 


Compute (T? + I~". By the Cayley-Hamilton theorem we have y(T) = 0. In 
the notation of (3.7) we take g(f) = t( — 1) and f(t) =2 +1. Divide g(t) by 
S(t), obtaining 


ue — 1) =t-(@ +1) - 2t 
Now divide ? + 1 by the remainder, getting (provided that 2 = 0 in K) 


t 
2 = nA 
e+. gi(-2) +1 
Thus 


t 
= CHD + 5-2 


PHT dE Mey - Ke + 
(1 = 8/2) ft) + 4/2) 0 
This is the method of Sec. 3, Chap. 6; and for a(f) in (3.7) we can take lL ~ #2. 


The differentiation operator B47 
Therefore, 
d+ =1- tr 


EXERCISES 

1. Let T denote an endomorphism of a vector space V, with characteristic 
polynomial ¢(i). 

(a) Express T~ as a polynomial in T if g(t) = (@ 4+ 1% 

(6) Express (I + T?)-' as a polynomial in T if (f) = &. Can you do this for 
(T+ Thy 

(c) Express (I + T + T?)-1 as a polynomial in T if g(f) = &. 

(a) Express (I + T + T?)-! as a polynomial in Tif o() = @( — 1). 

(e) Find (+27 +P)“ if of) = ue +1). 

2. Let a be a square matrix with coefficients in a field K, and let g(t) be its 
characteristic polynomial. Express (1 + a + a*)~? as a polynomial in a if g(t) = 
(1 — 6% Express (I — a*)— as a polynomial in aif g() = 1 +¢ + 4, or at least 
try to. 

3. The notation being as in Exercise 2, suppose that y(t) has no roots in K. 
Prove that a — el has an inverse for any ¢ in K. 


4. The differentiation operator 


As usual we denote the field of real numbers by R and the field of complex numbers 
by ©. By a complex-valued function on R is meant a mapping f from R to C. If 
such a function f happens to map R into R, then it is called real-valued. Thus we 
understand the term ‘complex-valued function” as including real-valued func- 
tions. A simple example is the function that sends an arbitrary rea] number x 
into x. This function is usually indicated by the symbol 2*. Similarly, the map- 
ping that sends an arbitrary real number x into x” (n a positive integer) is usually 
indicated by the symbol x". It is real-valued. However, the function f(x) = 
(2 + 2)-a* is not real-valued. If ¢ is any fixed complex number, then the map- 
ping that sends every real number into ¢ is a complex-valued function. Such 
functions are called constants, or constant functions. 

Let F* denote the set of all complex-valued functions. If f and g are two of its 
elements, then we define { + g to be the function that sends an arbitrary real 
number z into f(x) + g(z). If c is any complex number, then we define cf to be 
the function that sends any real number z into ¢- f(x). It is easily verified that 
F*, equipped with these two operations, is a vector space over C. Its zero element 
is the constant function 0 that maps all real numbers into 0. We can in fact make 
F* into a ring by defining the product fg of two of its elements to be the function 
that maps an arbitrary real number z into f(z) - g(z).t The three operations just 
defined can be described briefly by the following equations: 


{ Not to be confused with the composition of two mappings, which in general here does 
not make sense, 


348 Rings of operators and differential equations Ch. 12, See. 4 


(f + 9)(@) = fle) + 92) 
4a (e-f)(@) = ¢- f(z) 
(faa) = f(x) - g(z) 
where x denotes any real number, ¢ denotes any complex number, and where f 
and g are any complex-valued functions on R. As already mentioned, the first 
two operations make F* into a vector space over C. The first and third operations 
make F'* into a commutative ring whose unit element is the constant function, 
denoted simply by 1, that maps all of R into the number 1. F* with all three 
operations defined above is a C-algebra (Sec. 4, Chap. 8). We observe that if 
¢ is a complex number, and if we also denote by c the constant function that maps 
all of R into c, then the symbol ef is defined by both the second and third equa- 
tions of (4.1); the two definitions are clearly the same. 
The system F* is not very interesting for our purposes, and we are now going 
to define a certain subsystem: 
A complex-valued function f on R is called differentiable if, for every real num- 
ber x, the limit 
rita 


42 lim —. 
ce) 


2h) = fe) 

h 
exists.t This means that there exists a complex number L for which the following 
is true: given any positive real number ¢, there is a positive real number d such thal 


fet) = fla) _ 
= He) 


<e 


for all real numbers h x Owith —d <h <d. The number L satisfying this require- 
ment, if it exists, is unique and is denoted by the symbolism of (4.2). Obviously 
the limit L will in general depend upon z, and we shall sometimes denote it by 
f(z). Thus, if f is a differentiable function, then by definition 
4a fey = tim 244) fo 
ino he 

Since /’ assigns a complex number f’(z) to every real number z, /" is again a com- 
plex-valued function, that is an element of F*. The new function /’ obtained in 
this way from a (differentiable) function / is called the derivative of f. If f’ hap- 
pens to be differentiable, then we can form its derivative, denoted by /”; /” 
is also called the second derivative of f. If in turn {” is differentiable, then its 
derivative is denoted by j’”, called the third derivative of f, and so on. If the 
process can be repeated x times, we obtain the nth derivative /™ of f. 
+ The notion of limit that arises here is very similar to the one encountered in Chaps. 4 and 
5 in connection with Cauchy sequences. 

A definition of differentiability, identical in appearance to the foregoing, can also be made 


for mappings C — ¢ instead of from R to ¢, both x and h then being allowed to be complex 
numbers. 


The differentiation operator 349 


Other commonly used symbols for the derivative /’ of a differentiable function 
fare df/dx and Df. The corresponding symbols for the nth derivative of f (as- 
suming it exists) are d"f/dx* and D'/. 

A complex-valued function / is called indefinitely differentiable if its successive 
derivatives /’, f’, f”, . . . all exist. We shall be concerned here only with in- 
definitely differentiable functions. Some examples will be given presently. 

Observe that if a differentiable function f is real-valued, then its derivative /* 
jg also real-valued. For the difference quotient 


fe + = fee) 


aa hk 


appearing in the definition (4.3) is real for any choice of x and h (they are always 
supposed to be real), and therefore the limit of that quantity as h +0 is a real 
number. 

If f is any complex-valued function on R, let x(x) denote the real part of f(x) 
and let v(x) denote the imaginary part of f(x), so that f(z) = w(x) + d(x). Then 
u and v are two real-valued functions, and this decomposition of f is unique 
(Theorem 1.1, Chap. 5). The difference quotient (4.4) can be written as 

u(z +h) — we) | eo +h) = 2) 
ue +B) = ue) ee +) = oe) 
h h 
If f is differentiable, then both the real and imaginary parts of its difference 
quotient must have limits as h — 0, as is easily seen, and therefore both « and e 
must be differentiable functions, and j’ = x’ + iv’. Hence derivative calcula- 
tions with complex-valued functions can be reduced in a trivial way to similar 
calculations with real-valued functions. 


Example 1 Let ¢ be a complex number, and consider the constant function 
f(z) = ¢ for all real z. The difference quotient (4.4) for this function is zero for 
all x and h 0 and consequently has the limit zero as h — 0, whatever x. Hence, 
a constant function is differentiable, and its derivative is the constant function 0. 
It follows that a constant funetion is indefinitely differentiable, all its derivatives 
being zero. 


EXAMPLE 2 Consider the function f(x) = = for all real numbers x. The difference 
quotient (4.4) is equal to [( +) — x\/h = 1 for all x and for allk #0. If 
follows at once that the derivative of this function is the constant function 1. 
Hence /’ is differentiable, and f” = 0. All the successive derivatives exist and 
are zero. 


exampLe3 Let f(x) = <* for all x, x being a positive integer. From the binomial 
theorem (Theorem 5.2, Chap. 3) we have 


thy ee nthe (Seie ee 


350 Rings of operators and differential equations Ch. 12, See. 4 


Therefore the difference quotient (4.4) for this f is 
nel G) wh foe fp hrel 


The limit of this as h +0 is na”-', and so f(z) = ne""!. This is again a dif- 
ferentiable function, by the same argument, and its derivative is "(z) = 
n(n — lx". If m = 2 this is a constant function, with derivative equal to 
zero, If > 2, the process can be repeated, giving f"(z) = n(n — 1)(n — 2)z">, 
ete. Starting with 2" we then get the following sequence of derivatives: 


ne" n(n — Da" n(n — (me — 2a", 00,2. | 


That is, the th derivative is the constant function x!, and all further derivatives 
are zero. 


EXAMPLE 4 Let f(z) = a) +a +--+ + 0,2", where a, a, ..., a, are any 
complex numbers. Using the results of Example 3 it is easy to see that f is dif- 
ferentiable, and 


f(a) =a, + Qaae +++ + + nage! 


again a polynomial. 
ew 
exametes Let f(z) se = Lt eto, tat °° 


(see Sec. 4, Chap. 4}. It is a standard theorem of calculus that the infinite series 
converges for all x and that f"(x) = f(z). This function is equal to its own deriva- 
tive and is therefore indefinitely differentiable. More generally, putting f(z) = 
ew = 1 +4 ax + (ax)?/2! + (ax}*/3! 4+ +++, where a is any complex number, 
we have f(x) = ae™, and this function is also indefinitely differentiable. 

Tf f and g are two differentiable functions, then it is an elementary theorem of 
calculus, easily proved from the definition above, that f + g is differentiable, and 
+9) =f +9’. If cis any complex number, then ef is differentiable, and 
(cf)’ =e¢-f’. Furthermore f-g is differentiable, and (f9)’ = fg + fg’. In par- 
ticular, if f and g are infinitely differentiable, then so are f + g, ef and fg. 

Now let us denote by F the set of all indefinitely differentiable complex-valued 
functions on R. From the remarks just made, F is a subspace of F*. Since the 
product of two elements of F is again in F, it follows that F is also an algebra. 

We denote by D the mapping of F to itself that sends an arbitrary function f 
into its derivative f’. We shall usually write Df instead of f’. The rules cited 
above become 


Dif +9) = Df + Dg 
as D(cf) = ¢- Df 
Dig) = (D+ 9 + f- (Dg) 


The differentiation operator 351 


where f and g are any funetions in F and where c is any complex number. The 
first two rules say that D is a linear mapping of the vector space F to itself. The 
third rule (“produet rule”) contains the second, for De = 0 for a constant function. 
D is called the differentiation operator. 

Since D maps F + F, we ean form the composition D>» D = D’. This means, 
as usual, D applied twice. Thus D?f = DID) = D(f) =f". That is, DY is what 
we have called the second derivative of f. Similarly, D*f = #™ = nth derivative 
of f, for any positive integer n. By D® or I we shall sometimes denote the identity 
mapping of F. From our examples above we have Dx” = nz"™!, Drz? = n! (n 
a positive integer), De* = ae”, ete. 

Just as with any endomorphism of a vector space, we can form “polynomials” 
in D with complex coefficients, any such being again a linear mapping of F to itself. 


Exampce ¢ For the operator D? — 2D + I we have, by definition, (B® — 2D + Df 
=D¥-—WDW+fay"—2f +f. If f= 3x2 42, then the result is 32° — 
1822 + « + 16, 

‘We point out explicitly that our space F contains all polynomials with complex 
coefficients and all exponentials e*, where a is any complex number. Since sums 
and products of elements of F are again in F, it follows that F contains all func- 
tions which can be built by a finite number of sum and product operations from 
polynomials and exponentials, 


EXERCISES. 
1. In the following determine the polynomial p(é) such that the indicated ex- 
pression is the same as p(D)f: 


ef. af _ 
) Ge +8 ae — OF 
bf af 
© os ae 


(co) #8 — 16/4 2if" — Var 


nC) (E-) 


3. Is = éé -/+ g -/) = (DD —) +1 - HIP? 


cui) +(e) 


5. Let f g be two functions in F, and let 2 be a positive integer. Prove Leib- 


(D+1h@ —- py? 


yy? 


niz’s formula: 
2 fn 
D'(fg) = (DEF) - (Dr-#g) 
EG) 


6. If pid), q(, r(é) are polynomials in C[q such that p(t) = g(é) - r(@), is it true 
that p(D) = ¢(D) »r(D)?_ State your reasons. 


352 Rings of operators and differential equations Ch. 12, See. 5 


7. Compute (D! + 3D? — D)(42° + 2x — x +1). 
8. Let k be an element of F, Show that the mapping AD of F to itself that 
sends an arbitrary f into k- Df = h/’ is a linear mapping. Compute the operator 
D> (1 — 2)D and the effect of it applied to a polynomial of 4th degree. 
*9, Let T: F — F be a mapping for which the rules (4.5) are valid. Prove that 
there is a function A in F such that T = kD. 


5. Some differentiation formulas 


From (4.5) it is easy to deduce some useful rules for differentiation, and we list a few 
here, referring to standard calculus texts for more details. 

First, if f(z) and g(x) are two differentiable complex-valued functions on R, we 
cannot in general form either of the two composite mappings f(9(z)) or 9(/(z)). 
But if one of the functions, say /(z), is real-valued, then the composition g(/(z)) 
makes sense [we do not write this as gf because of the confusion with the product 
of g and f defined in (4.1)]. This composite function is again differentiable, and 
we have the “chain rule” 


$a (ofa)! = 9'(f(@)) -f'@) 


For example, if f(z) = ex, ¢ being a real number, and if g(x) = e*, then g(f(x)) = 
e, and the rule above gives us 


(et) = eet 
As pointed out earlier, this rule holds for complex ¢ also. We write the rule ast 
52 Deer") = ce 
Combining this with the product rule of (4.5) we get, for any fin F, 
53 Dee) = (Def + e Df = cee + e“Df 
or 
5a Def) = e#(el + Dif 


where again I is the identity mapping of P. 
Applying D to both sides, we get 


D(e7f) = Dlet(el + D) 

and using (5.4) here, with (cl + D)f in place of , we get 
e*(el + D)(el + D)f 

or 

55 D%er“f) = e=(el + D)*f 


+ In the language of Chap. 11, e* is an eigenfunction of the operator D. 


Some differentiation formulas 353 


Applying D repeatedly in this way, it is easily seen by a simple induction argument 
that. 


56 Di(erf) = er(el + DY 


for any positive integer k. 
Now let 


87 pe) = > att 

i 
be any polynomial with complex coefficients in an indeterminate !. From (5.6) 
we get 


PD enh) = (3 ap ern 


=O 


= Y abies) 
f= 


=O ace el + DY 
fi 


or 

5.8 pD)(e“f) = etp(el + Dit 

If we replace ¢ by —e¢ here, then the formula reads 

59 p(D)(e“f) = e—=p(D — cf 

This formula is basic for the sequel; writing e~f = g, we get the formula 
548 e*. p(D)g = p(D — eI)(e%g) 

example Here we have p(t) =1+4+fande = 2: 


(+ D + D%3(ze) = e* - [1 + (81+ D) + B+ DP lz 
eet. [x + 82414 (81+ D)- 8x +1)] 
sets(z +32 +1+92+3 43) 
= (7 + 182) 


EXERCISES 
1, Compute Dez), 
2, Compute D¥(22e"). 
3. Compute (I + D)§ (e* - 2). 
Find functions / satisfying the indicated conditions: 
4, Df(x) = x*(n an integer >0) 
8. Df(x) =e 


354 Rings of operators and differential equations Ch. 12, See. 6 


6 Df(x) = ze* 
7. Df(z) = xe 

8. Df(x) = 2"e* (nan integer >0). 

9 D'f(z) = 0 (find more than one answer). 

10. D°{e-*f(x)) = 0 (find more than one answer). 

11, (D? + Df = 0 (find more than one f). 

12, Deduce (5.6) from Leibniz’s rule (Exercise 5, Sec. 4). 


6. Linear differential equations with constant coefficients 


A differential equation is, loosely speaking, a relation connecting an unknown 
function and certain of its derivatives. The problem, of course, is to find the un- 
known function. A simple example is the equation 


f+ieo 


A solution of it is f(x) =e, by (5.2), and in fact any solution must be of the 
form f = ce-* (ca constant), Another example is 


Pragno 


of which e* and e~* are solutions, again by (5.2). Every solution must be of the 
form ae* + he, with a and b arbitrary constants, The simplest differential 
equation imaginable is 


F(z) = ae) 


where g(x} is some given function. The solution of this equation is the chief 
subject of the integral caleulus and is outside our province. We shall not go into 
the matter except for some rather special choices of g(x). Certain kinds of dif- 
ferential equations, apparently much more complex than the one just mentioned, 
present algebraic peculiarities which render their solution quite simple, and we 
shall confine our attention to such equations. The type we propose to study here 
is of great practical importance. 

Sometimes differential equations arise in the form of systems consisting of sev- 
eral equations involving several unknown functions and their derivatives. We 
shall consider some examples in Sec. 9. 

Differential equations are of basic importance in most physical applications of 
mathematics, 

A linear differential equation is one of the type 


at acy af 
! tee 2 f= 
mg + et ei + tag tat=9 
where a, a, . .. , Ga, g are given funetions. We shall consider here only the 
case in which the coefficients ap, . . . , a, are constants. With the D notation 


the equation can be written 


Linear differential equations with constant coefficients 855 


6a anD*f + ayaD f+ - ++ + a,Df + af =¢ 
or 
62 (a,D° + a,1De + +++ +aD +alf =¢ 


Tf we define the polynomial p(t) by 
63 Pe) = SY at® 
& 


then our equation is 

64 B(D)f = 9 

Since our field of sealars is C, every polynomial in C{é| can be written as a product 
of factors of the first degree (Theorem 4.2, Chap. 6), and so p(t) can be put in the 
form 

6s pit) = aa(E — cr) (f — ce) ee (Et 


where ¢;, ... , ¢ are the distinct roots of p(é), the k's being positive integers. 
As we have seen, (6.5) remains correct upon substitution of the endomorphism 
D for ¢ (see Sec. 3), and so (6.4) is the same as 


66 (BD - al(D — ele... (D- ¢D*f = 


It is therefore clear that the study of linear differential equations with constant 
coefficients is closely connected with the study of operators of the type (D — cI). 

We now take up that question. 

The differentiation operator D is a linear mapping of the vector space F of in- 
definitely differentiable functions to itself. D is not an isomorphism, for D maps 
every constant function into 0, as was indicated in Example 1, Sec. 4. Conversely, 
it can be shown that f must be a constant function if Df = 0 (this is a well-known 
theorem of calculus). Hence, 


67 The kernel of D; F — F is the one-dimensional subspace consisting of the 
constant functions. 


If Df = Dg, then D{f — g) = 0, and so f — g = constant. Therefore we see 
that. 


aa Df = Dg if and only if f = g + ¢, where ¢ is a constant 


Now let us determine the kernel of D". In Sec. 4 we saw that De” = mz, 
and by repeated applications of D we get D™x” = m!. Hence D*x™ = Oifn > m. 
Let us denote by Ps the subspace of F consisting of all polynomials with complex 
coefficients, of degree fess than k, Thus in particular P, consists of all polynomials 
of degree zero—hence of all the constant functions. By Py) we shall denote the 
subspace of F consisting of the zero function alone. 


356 Rings of operators and differential equations Ch. 12, See. 6 
es The operator D maps Py onto Px_1 for every positive integer k. 


For if f is in P;, then clearly Df must have degree 1 less than the degree of f, 
and so Df has degree <k — 1, since f has degree <k. On the other hand, if 


6.10 g(x) = a9 Fat +--+ + ay sah? 


is any element of P; 1, then, putting 


61 hia) = age + Gat $ 


we have Dh = g, showing that D maps P; into all of Px_2. 
By a very similar argument we prove that 


Gz Ff Df is in Py, then f is in Pr. 


For put Df = g, and let h(z) be the polynomial obtained from g as in (6.11). 
Then Dh = 9, and sof =h +, where ¢ is a constant, by (6.8), since Df = Dg. 
Hence J is a polynomial of the same degree as h and is therefore in Px. 


6.13 The kernel of D* is Py. 


This is true for n = 1, by (6.1). Suppose it is true for x — 1, nm being some 
integer >2. Now D'f = 0 if and only if D*~' (Df) = 0, whieh is true if and only 
if Df is in the kernel of D*—, By assumption, that kernel is Py», and Df is in 
P,_1 if and only if f is in P,, by (6.9) and (6.12). 


ou P, is a vector space over C of dimension u, and 1, z,... , 2"? forma base. 


For if co, ¢, . . «5 €n—1 are any complex numbers, not all zero, then cy + ea + 
+++ + ¢,_yx""! cannot be the zero function, since this polynomial cannot have 
more than n — 1 roots (Theorem 4.1, Chap. 6). 


We now introduce the subspace P,.< of F consisting of all the elements of P, 
multiplied by e. 


6.15 Pri. is a vector space over C of dimension n, and e, xe, .. . , xt —let 
form a base. 


This follows at once from (6.14), since e‘* cannot vanish for any z because 
eet — 1, Therefore e(cg + eye + + + > + ¢,s0"7!) cannot be the zero func- 
tion unless all the coefficients co, . . . , Car are zero. 

We can apply formula (5.8) to calculate the kernel of (D — cI)". Taking p(é) = 
t" in (5.8) we get 


6.16 Dre-ef) = e-(D — elf 


Since e~* = 0 for any 2, it follows that (D — el)" = Oif and only if Dv(e“f) = 0. 
From (6.7) it follows that e~“f is a polynomial in P,, and so f is equal to e= times 
a polynomial in P,, That is, 


Linear differential equations with constant coefficients 357 
6.17 The kernel of (D — cll)" is Py 


Consider an element e“g(x) in P,.< (that is, g(x) is in P,]. We have [see Eq. 
(5.4)] 


Dievg(z)) = e{D + el)g(z) 
Since (D + cl}g(~) is again in P,,, it follows that D(e“g(z)) is again in P,,.. Hence, 
6.18 D maps P,,, into itself. 


sis Denoting by T the operator D restricted to P,.c (that is, Tf = Df for f in 
P<), the characteristic polynomial of T is (t — ¢)*. 


To show this we observe that the functions 


(e=0,1,...,"-1) 


form a base for P,.., by (6.15). We have, using (5.4), 
(D = cl) um = e™ DEE“) 


a) 


(Mea fork > 0 
lo fork = 0 


Recall that Tx, = De,, and so we have 


Tt, = Cy + tea (k=l,...,2—-1) 
Tuy = eu 
The matrix of T with respect to the base uo, %, . . . , @n—1 Of Pr. is therefore 


and the characteristic polynomial of this matrix, hence of the endomorphism T, is 
clearly (¢ — ¢}*, as claimed. 


6.20 If b is a complex number different from c, then the operator (T — 61)” has an 
inverse for any positive integer m. 


This follows at once from Sec. 3 since the corresponding polynomials (! — b)" 
and ({ — ¢)” are relatively prime. 


+ We introduce a new letter here because the situation at hand depends strongly on the 
particular subspace P,., under consideration. It would not do to use D indiscriminately. 


358 Rings of operators and differential equations Ch. 12, See. 6 


It is important for us to have an efficient method for computing (T — bI)". 
To do so we have only to find polynomials p(t) and g(t) such that p(t) - (f — b) + 
gi) @ — ¢)" = 1, As we have had occasion to remark earlier, this can be done by 
starting from the general identity 


eae (y — DA te te tee ty) ay dt 
In this we put 


any = 
Y pre 


° 


(legitimate because 6 # c). After a routine simplification, we get 


1 t-—c¢ t= e\ect @-e\y 
oa ee +(5=4) ]- yee) 
Then indeed we have 1 expressed as a combination of ({ — b) and (f — ¢)". Now 


in this polynomial identity we may substitute the operator T for ¢, and recalling 
that (7 — el)* = 0, by the Cayley-Hamilton theorem, we get 


1 T-a T — d\ 
sa Ce Come |--1 
That is, 

1 Ta T — el\! 

-wo- - To. 
e2s | (T — 8D) polit + +44 ] 


Recall that T was used in place of D to indicate that we are considering only a sub- 
space of F, namely, P,... The formula just obtained is not true for all of F, because 
D — d1 does not have an inverse on F. However, the fact that the formula is 
limited to the subspace P,,,. is sufficiently indicated by the T on the left, and we may 
safely write D on the right. Doing so and raising both sides of (6.25) to the mth 
power, we get the general formula 


626) (T — b1-" = p(D — cl) 


where p({) is the polynomial 


epi) -[-iG ++ tee +E YT @=b-0 


Observe that, since (T — cl)" = 0, all powers of D — cl exceeding the (n — 1)th 
can be omitted. 


THEOREM 6.1 Let g(x) be a polynomial of degree <n, and let b, ¢ be two complex num- 
bers, with b xe. Then every solution of the differential equation 


628 (D — bf = e%p(a) 


has the form 


Linear differential equations with constant coefficients 359 


ee fee) [24 4 t+ YP oe + hem 


where h(x) is an element of Pns and where a = b —¢, Conversely, (6.29) is a solu- 
tion of (6.28) for any h in Paw 
Proof. Let. f,(«) stand for the first function on the right of (6.29), 80 
that 


f(z) =e - p(D)g(z) 


where p(Q) is the polynomial 


ron 


Then 
fiz) = p(D — el) (e*9(z)) 
by (5,10), and e*g(x) isin P,... Therefore, by (6.26), 


(D — bf, = (TP — bf = (T — bh"(T — b1)-™(egz) = eg(x) 


That is, f, is 2 solution of (6.28). If f is another solution, thent 
(D — 6° — fi) = 0, and so f — fy is in Ps, by (6.17). That is, f 
fi +h, whereh isin P,.s. Conversely, if h isin P,,», then (D — dD"h = 0, 
and so f, + his a solution of (6.28). @.E.D. 


The following theorem takes care of (6.28) for the case ¢ = 6. 


THEOREM 6.2 Let g(x) be a polynomial, and let b be a complex number. Then every 
solution of the differential equation 


6.30 (D — b1)"f = eg(xy 
is of the form 
6.31 f = g(x) + h(x) 
where g(x) ts a polynomial such that Dg, = g and where his in Py». Conversely 
every such function f is a solution of (6.30). 
Proof. Put f = egi(x), where gi is any polynomial such that D"g, = g. 
We have 


+ Strictly speaking, we must assume that / is indefinitely differentiable here in order to ap- 
ply (6.17). But if we assume that / has only as many derivatives as required for (6.28), 
namely, m, it is easy to deduce that / must be indefinitely differentiable. For the equation 
can be written 


Lm def taf te + Geil + exg(z) 


where a, + - + , 4. are constants. Since every term on the right is differentiable, it fol- 
lows that f‘”’ is differentiable, and so f'"+ exists. By a simple induction one sees easily 
that all the derivatives of f exist. 


360 Rings of operators and differential equations Ch. 12, See. € 
(D- hh 


(D — bdyr(ebg,) = Deg, = g(x) 


by (5.10), showing that /; is a solution of (6.30). Now let / be any solution 
of (6.30). As in the footnote on page 359, f must be indefinitely differenti- 
able. We have (D — bly"(f — fi) = 0, and sof — fris in Ps, by (6.17). 
Thus f = fi + 4, where his in Ps, Conversely, if k is in that subspace, 
then (D — d"(f, +k) = (D — Df = e' g(a). 


In the next section we shall show how repeated applications of these two theo- 


rems lead to the solution of (6.6) under certain assumptions concerning the right- 
hand side. 


exampce 1 Solve (D + 21)*f = e-*(1 + 2°). This case is covered by Theorem 
6.1, with m = 3,6 = —2,¢ = —1,n =3. Thena =b—e = —1. Henee, the 
solution is f = f, + h, where h is any element of P,,., = Ps», and where 


ads Oar 


= el - D + DPA + 2) 


fi 


Since D'(1 + x2) = 0 if k > 2, we can throw out all powers of D on the right ex- 
ceeding 2. Then 


f 


el — 3D + 6D] (1 + 2%) 
ev[(L + 2%) — 3+ @x) + 6- (2)] = e*(18 — 62 + 2) 


The function k in P,,. must have the form 


"(ag + aye + aor") 


where ap, a1, a2 are arbitrary constants and so the most general solution of our equa- 
tion is 


f = e7*(13 — 6x + 2°) + (ay + ane + apr") 
It is easily verified that this is a solution, as guaranteed by Theorem 6.1. 


ExXAMPLE2 Solve (D + 21}*f = e-**(1 + a”). Here we are in the case of Theorem 
6.2, with m = 3,5 = ~2, g(x) = 1+ 2%. We must find a polynomial g:(x) such 
that D°g, = g. That is easily done by the method of (6.10) and (6.11), Namely, by 
“antidifferentiating” once we get D(z + 21/3) =1+2%. Repeating, D(2?/2 + 
xt/12) = x + x'/3. Repeating again, D(z'/6 + 2°60) = 2°/2 + 21/12, There- 
fore we can take g(x) = 23/6 + 2'/60. By Theorem 6.2 the most general solution 
of our equation is f = e~*g, + h, where h is in P,; 


Hence, 
F(a) = e (08/6 + 25/60 + ay + are + ar") 


where ay, a1, a) are arbitrary constants. 


Finding particular and general solution 861 


REMARK. The identity p(D){e“f) = e“p(D + el)f of (5.8) can often be used to 
simplify caleulations. Consider for example the computation (D + 61)‘(x%~7). 
Take p(t) = (£ + 6). Then our problem is to work out p(D)(z%-"). By the 
identity, this is equal to e~* p(D — 71x. Since p(D — 71) = (D — 71 + 6It = 


(D — 1’, the answer is 
«|p - (i) + (2)pe - (3) + 1} 


e-(-4-6 46-6" - 4-327 + 2°) 
= e7"*( ~24 + 86a — 122? + 2) 


e(D = T}tat 


It is worth comparing this method with the method of direct computation. 
Another example: Compute D{et* (82% + 2z)]. 
Here take p(é) = 4, The identity reads then 


p(D)[e(322 + 2x)] = ep(D + 61)(3z* + 2x) 
= (D+ 61) (82? + 2x) 


Using the binomial theorem, we get 
. 4 Noone . (4 . 
opt +(1 eb" + (| Jem + (; )oD + ones" + 22) 
from which the D¢ and D* terms can be dropped. The result is 
(6D? + 4- 6D + 64x? + 2x) = Bed 4+ 36x + 182%) 


EXERCISES 


Compute the following by the method indicated above. 


1. (D + 21) (ate) 2, (D + 21)(ate“*) 
3. (D — 21 (xe) 4. (D + 212) 
5. Die (az + 2x? - 1)] 6. (D — T)8(rie™ 4 abe) 
Find the most general solutions of the following differential equations: 
7. DY = 327 4+2 8. (D + 21°F = xe“ 
9. DY = xe" 10, (D + 24 = 2 
11. (D - 4D = 0 12. (D — 2Uyf = ate” + (2? + Le 


7. Finding particular and general solution 


We now turn to the general linear differential equation with constant coefficients. 
As was remarked at the beginning of Sec. 6, any such equation can be put in the 
form 


ma RDF = 9 
where p stands for a polynomial: 


m2 p(t) = do Fae t+ >> fa, +e 


362 Rings of operators and differential equations Ch. 12, See. 7 


(No loss of generality results from assuming that the highest coefficient a, of p{) is 
1, for, if it were not, we would merely have to divide (7.1) by a,.| If bib: . . . .o- 
are the distinet roots of p(t), then 


7s pli) = (FBO (= Baym = bm 


where mm, 7m, ... , mt, are positive integers (their sum must be n = deg p(t)]- 
Then (7.1) can also be written 
ne (D — bD)™(D — bN™ -.- (D —bA)™f = ¢ 
since the substitution { — D produces a homomorphism of the polynomial ring 
Cli] into the ring of endomorphisms of the vector space F of indefinitely differenti- 
able functions. We observe that if g(x) in (7.4) is indefinitely differentiable, then 
any solution f of the equation must also be indefinitely differentiable (see footnotet, 
page 359). 

We shall consider (7.1) or (7.4) only for special types of functions g(x). Namely, 
we shall assume that g(x) can be expressed ag a sum 


1s g(t) = egi(z) + +s + eg.(2) 
where each gi{x} is a polynomial. Any such g(x) is indefinitely differentiable, as 
we have seen. 

The analysis of (7.1) can be broken into a number of simple steps. First of all, if 
f and fo are two solutions of (7.1), then for their difference kh = f — fo we have 
(Di = p(D\f — fo) = p(Dif ~ p(D)fr = ¢ —9g = 0. Hence h is a solution of 
the so-called homogeneous equation: 


Le piDr = 0 


In other words, # is in the kernel of the operator p(D). Conversely, if h is a solution 
of (7.6) and if fy is a solution of (7.1), then f = fo + A is also a solution of (7.1). 

AU solutions of (1.1) are oblained from any given solution fu by adding to faa solution 
of the homogeneous equation (7.6). 

A specific solution of (7.1) is sometimes called a “particular solution.” 

The next observation is that, if g(z) has the required form (7.5) and if f; is a 
solution of 


sd BID)fs = er g.(z) 
fori = 1,2, ...,s, then the funetion fo = fi +f: + +++ +f. is a solution of 
(7.1). For then 
PD) fo = PID + +++ +f) 
= p(D + +--+ + p(D)fe 
= evga) + +++ + e9,(2) = gle) 


Therefore, to find a solution of (7.1) with g(x) of the form (7.5), we have only to 
add up solutions of Eq. (7.7). Each such equation is of the type 


Finding particular and general solution 363 
78 PID) = e“g(x) (g(z) a polynomial) 
Write this as 
7 @D — 6D"(D — df) - - + (D — bf = e*g(x) 

The basic observation here is that if we put 

a(x) = (D — bly?) - - (D — 6D 

then (D — b1)e(x) = eg(x), and v(x) ean be found from Theorem 6.1 or 6.2. 
Repeating this process + times, we finally get f. To do this methodically, set 


7.10 %-1 = (D — bl) - (D — b1)f 


fork =0,...,r. In particular, then, x = p(D)f = e“g(z), and», =f. Further 
more, (D — bf)", = %1. Writing this out we get the following series of equa- 
tions: 

(D — by, = up = g(x) 

(D — bel)™% = v1 
nat (D — bal)"03 = 0 


(D — oD, = ta @ =f 
These equations are all of the same type, and by solving them successively for 
Dy Moy + + + Pron 8 = f, we finally obtain a solution of (7.9). Starting with the 


first equation we find r, from Theorem 6.1 (or else from Theorem 6.2, if ¢ = 4). 
In either case, the most general solution must have the form 

*=go(2) + egy (x) 

where g; and g: are polynomials, where the first term is a particular solution of the 
equation, and where the second term is a solution of the homogeneous equation 
(D - bI)"h = 0. Thus, g hasdegree < m. Andif g(x) = 0, wecan take go = 0. 
Then the second equation of (7.11) is 


(D — bel)™v, = em go(x) + egr(x) 


y(z) = 


and it can be solved by Theorem 6.1 (or by Theorem 6.2 if ¢ = bz), the two terms 
on the right being treated separately. It follows then that the most general solu- 
tion is of the form 

v2 = eho) + evhi(ar) + ebhelar) 
where the h,(z) are polynomials, with deg hi < mu and deg ke < mz. Here the first 


term stands for a particular solution of the equation 


(D — bl)" = ego(x) 


and may be taken as zero if go = 
Continuing in this way with the third equation, etc., we finally arrive at v, = f, 
and we find that f must have the form 


364 Rings of operators and differential equations Ch. 12, See. 7 
Taz f= etay(2) + PPwz) + +. + ertw(x) 


where wy, W, ... , W, are polynomials, with deg w, < m, deg we < mm, ..., 
deg w, < m,, and where the first term is a particular solution of the equation 
p(D)f = e“g(x). If g(x) = 0, then we can assume that wi(z) = 0. Hence we have 
the following theorem: 


THEOREM 7.1 The most general solution h of the homogeneous equation (7.6) has the 
form 


2a3 hektht++- +h 


where hy is an element of Pais, = 1, ..., 1). That is, ky = °* X (polynomial 
of degree < m,). Any such function is a solution of (7.6). 
Collecting our other results above, we can state the theorem as follows: 


THeorEM 7.2 Every solution of the differential equation (7.1) is obtained from any 
particular solution fy by adding to it a solution h of the homogeneous equation. If the 
right member g of (7.1) has the form (7.5), then (7.1) has @ solution, and every solution 
must again be a sum of polynomials multiplied by exponentials. A particwlar solution 
can be obtained by repeated applications of Theorems 6.1 and 6.2. 


REMARK 1. If ¢ in (7.8) is different from all the b’s, then only Theorem 6.1 is 
used in solving (7.11). It follows from that theorem that one can find a particular 
solution e*we(z) of (7.8) in which deg w. = deg g. This useful observation is the 
basis for the method of undetermined coefficients, which amounts to writing down 
a polynomial of the proper degree, with literal coefficients, for w» and then de- 
termining the coefficients so as to satisfy the differential equation. The same 
method is available even if ¢ is equal to one of the b's, say,c = 6;, In solving the 
first equation of (7.11) it is necessary to use Theorem 6.2. One can affirm then 
that only the polynomial called gq(x) above in v, can be taken to have degree equal 
to m. + deg g(z). The remaining equations in (7.11) will then require only 
Theorem 6.1 and, as before, we conclude that a particular solution ewo(x) of 
(7.8) can be found with deg w) = m: + deg g(x). An example of the method of 
undetermined coefficients is given below. It is often swifter than Theorems 6.1 
and 6.2. 


ReMaRK 2. The general solution of (7.1) involves an arbitrary solution & of the 
homogeneous equation (7.6). From Theorem 7.1 it follows that A contains 
m, +m: ++ ---+ m, = x arbitrary constants, namely, the coefficients appearing 
in the polynomials. Thus the general solution of (7.1) contains x constants which 
can be specified at will. In most applications of differential equations, those 
constants, usually called constants of integration, must be determined in such a way 
that the corresponding solution fits other data. An important problem of this 
type—the initial-value problem—involves determining the constants of integra- 
tion so that fiz), f(z), ... , f(x) will all have preseribed values for some 


Finding particular and general solution 365 


given value of *, say xo. Then f is required to satisfy » equations f(a0) = yo 
L (ao) = ye FOO (eo) = yaa, Where Yo, thy - - - 5 Yor are given numbers. 
Tt is not hard to see that these equations enable one to determine the constants of 
integration uniquely. 
YY 8d 

exampuer Solve 72 — 84 4 oy =e, Our equation is pODif = er, with po = 
2 B42 = (f— 1) 2). In the notation of (7.9), bi = 1, bs = 2, 6 = 
g(2) = 1. The system (7.11) is 


(D - In =e 
(D — 2x = 0 


We need only find particular solutions of these equations, since the general solution 
of our problem is then obtained by adding on our arbitrary solution of p(D)k = 0. 
To solve the first equation we use Theorem 6.2, which tells us that a solution of it is 
e- g(x), where g(x) is any polynomial such that Dg, = 1. We take , = 2, 
hence = ze’. The second equation is (D ~ 2I)r. = xe’, By Theorem 6.1, Eq. 
(6.29), a particular solution is e = e[-I — Dlx = -e*. (x + 1). By (7.11), eis 
a solution of our equation, as is easily verified. By Theorems 7.1 and 7.2, the 


general solution is 
f= err +1) a0! + be 

where a, 6 are arbitrary constants. Writing a = a’ — 1, we have 
f= ~ne" + a’et + de® 

where a’, 6 are arbitrary. 

If f is required to satisfy the initial conditions (0) = 1, (0) = 0, then we get 

£0) =a +b =1 
£0) = -L4+a'4+2b =0 

whence 6 = 0, a’ = 1. The required solution is ~2xe” + e”. 


EXAMPLE 2 Solve /’” — 3/’ + 2f = xe" +. 2°. The equation is p(D)f = 0, with 
pit) = & — 8142 = ¢ — 19%! +2), According to our general method we first 
find particular solutions of 


au (D — DXD + 21f, = ren? 
and 
ms (D-DD + Wh 


For the first one we have, in the notation of (7.9), b: = 1, b. 
The system (7.11) becomes 
@D ~ b'r, = ze 
(D 2 =m 


= -2,¢= -1, 


366 Rings of operators and differential equations Ch. 12, See. 7 


From (6.29) we can take 1, = e—*[~14(1 + D/2)Px = l4e{l + D + Dé/djr = 
lje* (x + 1). Applying (6.29) again, we get 


m = el — DI CH 


= Muze" 


This is a particular solution of (7.14). 
For (7.15) we have b; = 1, 2 = ~2,¢ = 0, g(z) = 2%, in the notation of (7.9). 
The system (7.11) is 


@D ~Dmy = 2 
(D + ue =u 


By (6.29) we can take w = [-( + D + D(x?) = (1+ 2D + 3D’) = 2 + 
4x +6, Applying (6.29) to the equation for u2, we obtain similarly #, = (14, 
(i — D/2 + DY/4) (2? + 4x 4 6) = 14(2? + 8x + 9/2), which isa particular solu- 
tion of (7.15). The general solution of the original equation is therefore 


f(z) = Lgxe 4+ 16(2? + Bx 4+ 9/2) 4 eX(ax + 6) + ce 


where a, 5, ¢ are arbitrary constants. 

To solve the equation by the method of undetermined coefficients, we infer from 
Remark 1 that the equation f’” ~ 3f’ + 2f = xe-* must have a particular solution 
of the form # = (a) + ax)e~*. Substituting this in the differential equation one 
easily finds that ay = 0,a; = 4. Similarly, the equation /’” — 3f” + 2f = 2? must 
have a particular solution of the form y: = ¢ + et + ¢:2”. Substituting into the 
equation one obtains immediately ey = 94, c: = 34, « = 44. 


EXAMPLE 3 Solve f” + f = 0. Our equation is (D? + Df = 0, or (D + U)(D — 
if = 0. By Theorem 7.1 the general solution is f = ae~* + be”, where a, 8 are 
arbitrary constants. 


EXERCISES 

Find the general solutions of the following differential equations. Use both the 

method of Theorem 6.1 (or 6.2) and the method of undetermined coefficients to 
find particular solutions in Exercises 7 to 10. 


af 


2 pyno 2 paft 
a pap ee 

5 Papen 

eb pee 


Trigonometric functions 367 


af | ef — . 
Seta PL fe we trem 


10. a +fol + 2e7 + ee 
Uf +f = 14 26 +307 
wer pet prt pert] 


13. f" — 8 + Of = (det + Bx? + Det te $2 
Find the solution of this satisfying the initial conditions 
FO) = 1, £0) = 0, 770) = -1 
14, Find the solution of Exercise 12 satisfying the initial conditions 
£0) = 0, f'0) = 0, 770) = 0, 7°70) = 1 


&. Trigonometric functions 


Tn the preceding section we have seen how to solve any linear differential equation 
(7.1) with constant coefficients, provided that the function g(r) on the right is a 
sum of functions 
me) toss + erga) 
where each g:(z) is a polynomial. The sum and product of two functions of this 
type are again of the same type, and it follows easily that all functions of the type 
(8.1) form a subring E of the ring F of indefinitely differentiable complex-valued 
functions on R. £ is also a subspace of F regarded as a vector space over C. 

We wish to point out here that the ring E contains the functions sin az, cos ax 
(a any complex number); consequently £ contains any function which can be built 
from polynomials and from sin az, cos bz, e** by any finite number of additions, 


a1 gz) = 


subtractions, and products. 
We recall (See. 4, Chap. 5) that the exponential function is defined by the infinite 


series 


for any complex number z. Then 


fr + (ey +: 


eit 


l+Reey 


eon 
(-g45-3 a* le 


It is known from calculust that 


itaone) 


+ In calculus these formulas are proved for real z, The series converge for all z, real or not, 
and are used to define cos z and sin z for complex z. 


368 Rings of operators and differential equations Ch. 12, See. 8 


ata 6 
+ 2 a f 
sings +A 


cosz = 1 — dese 


Comparing the results, we have Buler’s formula 
e =cosz +isinz 


Replacing z by ~2 and observing that cos (~z) = cos z and sin (~z) = —sin z, 
we have also 


e-* = cosz —isinz 
Adding the two formulas, we obtain 
82 cos 2 = Le(e* + em") 


Subtracting the two formulas gives us 


i 1 git — re 
8.3 ain z = Bi (e ey 


Putting az for 2, a being any complex numbers: 
1 _ 
cos ax = 5 (e** + ei) 
2 
i 1 jaz _ goier 
sin ax = 5 (e en ier) 


This shows that cos ax and sin az are indeed in the ring # of all functions of the 
form (8.1). Frem (5.2) applied to (8.4) we have the differentiation formulas 


as D cos ax = —a sin ar 
: Dssin ax = a cos ar 
Thus D? cos az = D(a sin az) = —a® cos ar. Similarly, D® sin az = a? sin ar: 


In particular, sin x and cos x are both solutions of the differential equation f” + f = 
0. From Example 3, See. 7, every solution of this equation must be of the form 
ae + be", which we can also write a(cos x — i sin z) + b(cos x +7 sin z) = 
a! cos x + 6 sin x, where a’ =a + band b’ = i — a). 

Since the differential equation (7.1) can be solved by the methods presented 
above for any function g(x) in the ring E, it follows in particular that (7.1) can be 
solved if the right member (2) is any product of polynomials, sines, cosines, and 


exponentials. 


Example 1 Solve ’ ~ 3f = sin x. We can write this as f’ ~ 8f = (1/2i)e" — 
(1/2de-'*, It is obvious by inspection that ae‘ should be a particular solution of 


Trigonometrie functions 369 


the equation f’ ~ 3f = (1/2i)e* for suitable a. Substituting it in we find a = 
(=1 + 31/20, Similarly, a particular solution of the equation f’ — 8f = (1/2i)e~'* 
is [(~1 ~ 32)/20\e-*, Hence, a particular solution of /’ — 3f = sin x is 
-143i, 143% 
a0 ° 20° 


135 ((-1 + 82)(cos x + tsin x) ~ (1. + 8i}{cos x ~ isin x)] 
= '$o (-2 cosx ~ 6sinz) = —1{q cosx — 349 sina 

A result of this form was rather obvious to begin with, in view of the formulas 

(8.5). It would therefore have been possible to avoid imaginary numbers alto- 

gether by trying to find a particular solution of the form a cos x + 5 sin x. Sub- 

stitution in the equation would have led at once to the answer above. The general 

solution of our equation is 


= ~lig cos xc ~ 3f9 sin x + ce 
¢ an arbitrary constant. 


EXAMPLE 2 Solve f’” — 8f = 2° sin x cos 2x. The methodical procedure is to 
write the equation in the form 


1 1 
ye ew ted iegic — ecig } ggtic 4 gotie 
(DP ~ 81 = 2 Prag ev'ry eg (Re + eer) 
86 1 , 
= Glee — tet + wen — ste 


The operator on the left factors into 


(D ~ 2) — al)(D — 61 = yD) 


where a = —V3 +i, 8 = —W3 —i. We can now proceed as in Sec. 7. For 
example, to find a particular solution of p(D)f; = (1/4i)z%e™, the system (7.11) 
becomes 


1 phir 
(D ~ 21) = gate 


(D ~ aly, = a 
(D — BDz; 
A solution f; = v3 is obtained by repeated applications of the formula (6.27). 


Doing the same thing for the other three terms on the right of our Eq. (8.6), we 
obtain finally a particular solution of (8.6). We omit the details. 


2 


EXERCISES 
Find the general solution: In all cases express the answer without any imaginary 
exponentials. 


370 _ Rings of operators and differential equations Ch. 12, See. 9 


: : 
pe coee tine 2 poe erz 


df -f 


3. = = xe" sin x 4 a = e cos 2x 


1 


Gf 4 f= sine + sin' 2 6 Lo sinn s 
ay 
* drt 
@s 
a 


+f = & cos 32 


10. — 2f = 2sine 


9. Systems of equations 


Virtually no knowledge of calculus was required in the foregoing sections. Here 
we shall have to assume that the reader has some acquaintance with differential 
and integral calculus. In this section we shall take up briefly a topic closely re- 
lated to the homogeneous equation (7.6), but considerably more general. 

Let a;,(t) be complex-valued functions on R for i, j = 1, 2,...,” (we uset 
rather than x as independent variable here), and consider the system of equations 


dx; 
qe = Out ait bo + inte 
a 

94 = = Oat) + date + ove + + Aenty 
oe = Guy + Gqote to + + Gaaka 


By 2 solution of this system we shall mean an n-tuple of complex-valued differenti- 
able functions x(t), . . . , 2.(£) on R such that (9.1) holds for all values ¢. Tf 
n = 1, then the system (9.1) reduces to the simple equation 


an 


This can be solved as follows, if a(f) is continuous: 


whence 
log x = fadt 
o 


2) = b- efoat 


Systems of equations S87 


b denoting an arbitrary constant. We shall see that an analogous result holds for 
n > 1 under certain assumptions concerning the matrixf a(t) = (a;,(i)) whose co- 
efficients appear in (9.1). 

For each real number ¢, a(t) is an » X » matrix with complex coefficients, and 
so ais a mapping R — C,,,, the set of all x X n matrices with complex coefficients 
(we write C,,, instead of C”,, since we are using lower indices). Similarly, x(t) = 
(a(t), m6), - , 2, ()) is a mapping from R to the vector space of x-tuples of 
complex numbers. We call a(f) a matrix-valued function on R, and we call x(f) a 
vector-valued function on R. 

The matrix-valued function a(¢) is called differentiable if all its coefficients a,;(t) 
are differentiable. We denote by da/d! the matrix whose coefficients are da,;/dt. 

Similarly, the vector-valued function x({) is called differentiable if all its co- 
efficients 2;(é) are differentiable, and we write dx/di = (dz/dt, ... , dz,/dt). 
Throughout this section we consider x, dx/dt, ete., as column vectors, although we 
shall often write them as row vectors for compactness of notation. From the defini- 
tion of matrix multiplication the system (9.1) can be written 


9.2 dx/dt = ax 


Just as we can define the derivative of a matrix-valued or vector-valued function, 
we can define the integral. Namely, we put 


fr dt = (f'ox at) 


that is, fa dt is the matrix whose entries are fa,,dt, this for any continuous matrix- 
valued function (i.¢., whose coefficients are continuous). Similarly if x(t) is a con- 
tinuous vector-valued function, we define 


f "a dt 
F 


to be the vector whose components are 


fod G@=1,...,”) 
Fi 


It is possible to define the exponential of any square matrix with complex co- 
efficients. Thus, if b is such a matrix, then the finite sum 


ltbt here. +. be 


is certainly defined. It can be shown that the coefficients of this matrix, for 
m = 1,2,3,... , form Cauchy sequences and consequently converge as m — = 


+ We shall depart from our index conventions here. Only lower indices will be used, and 
in the matrix (a:,) the first subscript is the row index, the second subscript is the column 
index, 


372 Rings of operators and differential equations Ch. 12, See. 9 


to some limit, The matrix having these limiting values as coefficients is denoted 
by ¢!, Just as for 1 x 1 matrices (scalars), we write 


1 1 
9.3 male b+ab +--+ +b tee 


To sketch how this can be worked out,t consider a 2 x 2 matrix b whose eigen- 
values (Chap. 11} are a, 8. Then the characteristic polynomial ¢(y) of b is 
(y - a)(y — 8). Divide this polynomial into y". From the division algorithm we 
get a remainder of degree <1, or else zero. Suppose, say, 
a4 y= qv) — aby —B) teytd 
Putting successively y = a, y = 8 in this equation we find 

a =cat+d 
=B+d 
Solving for ¢ and d, we obtain c = (a" ~ 8")/(a — 8) and d = (Ba™ — a6"); 
(6 ~ a), assuming a ~ 8, of course. Substituting these values in (9.4) and putting 
b for y, there results 


as by = 


by the Cayley-Hamilton theorem Putting this in (9.8), there results 


Pom tad Hilt] = 8 b ~ Gar ~ 081 
. nad 
=e — ot) pp - (Figs et). 
-a-8 fe ee) eb (cbse rage) 


For example, for the matrix ( ‘) the eigenvalues are 1, and s0 


eb = 16(e — e7)- G i) tee + G ) 


Returning to the general case of x 7 matrices, from (9.3) we see that ef = 
e-I,e° =I. One can prove (see exercises) that 


26 eb.ee = ebte if be = ch 


a7 e> is nonsingular for any b 


} Another method is described in the exercises. 
{This gives an indication of how to make the computations involved in Theorem 4.5, 
Chap. 11. 


Systems of equations 873 
28 If b(t) is a differentiable matrix-valued function on R, then 


d 


gues a au 
it 


dt 
provided b(t) b(t.) = b(ts)b(h) for every pair of real numbers h, be. 
29 If a(t) is a continuous matriz-valued function on R such that a(i)a(l) = 
a(fzya(ti) for all 4, ts, then for the matrix 
q 
be = fae dt 
we hare b(4i)b(f2) = b(fa)b(ts) for all h, f; and dbvdt = a. 
THEOREM 3.1 Let a(é) be a continuous n X n matriz-valued function on R such that 
9.10 a(tija(te) = a(fejatti} 


for all real numbers hy, f. Lete = (cy, .. . 1 ¢n) be an n-tuple of complex numbers. 
Then the systen: of differential equations 


oat dx/dt = ax 


(where x is regarded as a column vector) has as solution the vector-valued funetion 
x(£) given by 


saz x(f) = (cle ° “ se 


and, moreover, x(bb) = ¢. 
For let u(¢) denote the exponential in (9.12), so that 


Then for the ith component of x we have 


whence 


ts fae 
deisel -3 Ney 


at 
showing that 
eae dx/dt = (du/dl) -¢ 


By (9.8) we have 


1 
du/dt = (G faa) 


374 Rings of operators and differential equations Ch. 12, See. 9 
and by (9.9) this is 

du/di = au 
Hence, from (9.13) and (9.14), 

dx/dé = auc = ax 


showing that x is a solution of (9.11). Putting ¢ = f in (9.12) gives us x(éo) = 
e-e=le=c, 
EXAMPLE 1 Consider the system (9.10), assuming that the coefficients a,, are 


constants. That is, the matrix-valued function ais constant on R. The hypotheses 
(9.10) are clearly satisfied. The solution (9.12) is very simple. For 


fade Gta 


clearly, and so (9.12) reduces to 
x= ela. 


A somewhat different method is as follows. Suppose that a has a linearly inde- 
pendent eigenvectors ws, .. . , Us, with corresponding eigenvalues pi, . . . , Pa 
(this is so if the eigenvalues are distinct, or if a is symmetric and real, ete.). Ex- 
pand the (unknown) vector x in terms of the vectors ui, say 


x) = aQ su 


rst 


Then 


and 


ax = y yi(t) + aur = > Piy Ui 
mI 


Equating these we obtain 


dy; : 

“a 7 Pe G@=1,...,n) 
whence 

ws = eer! 


the ¢; being arbitrary constants. Hence as a solution x we get 


Systems of equations 895 


examece 2 Consider a homogeneous linear equation 


4 
ta + ax = 0 


9.15 
for x(). Define 2, 2, . . . 5 ta1 by 
net yar 
6 aie ° 
Then we have 
=n 
=m 
ste 
Wt aun — yxy ~ Bp~1%, 
F oy — O11 pa1Tnd 


A solution of (9.15) gives a solution of the system (9.16). Conversely, as is easily 
seen, if x = (x, 21, . . . , Zn-1) isa solution of (9.16), then 2 is a solution of (9.15). 
If the coefficients of (9.15) are constants, then a solution can be obtained as in 


Example 1. 
exampte 3 Consider the system 


dn/dt = —x 
dxz/di = xy 


Here the matrix a is 
a-(9 72 
1 0 
We have a? = -I, at = —a, a’ =I, ete. Hence, referring to Example 1, 
1 1 1 
et ab tla + 5 ba +5 Fal t Fatt 
1 1 1 
- ly Legg lage... 
[+a — pel -gtat pas 
1 1 
-(1-getqe-t- ra (t- get Heo teeda 
= (cos HI + (ain fa 


Henee, as in Example 1, we get a solution x(¢) = (x1, 7) satisfying the initial con- 
dition x(0) = ¢ by putting 


x =e. e¢ = [(cos é)I + (sin dale 


376 Rings of operators and differential equations Ch, 12, Sec. 9 


at -( ~sin ¢ ( 
Le, sin ¢ eos t/\ en 


2 =e cost — ¢ sine 
2. =e, sin + €, cost 


or 


or 


where ¢, ¢, are arbitrary constants. 


EXERCISES 


1. Let B denote the vector space of all n X n matrices with real coefficients. 
We make & into a euclidean vector space by defining the “length” of a matrix a = 
(a;;) in E to be 


Prove that for any two elements a, bin Z, 
Ja-bl < @n)"?- al - bl 


(Hint: Ife, ... cn are real numbers with ¢ > ¢. > +--+ > ¢, > 0, we have 


(5 )) S2n > 2, by induction.] 
T 


T 


2. Show that 
ia] < (2n)h*- Jalt 
for any element a of E and any integer k > 0. 
*3. For any matrix ain £ show that the sequence 8, , 8, ... , 8, .. . , with 
saTtatpatt see baat 
is a Cauchy sequence, That is, show that for any positive real number d there is 
an integer p such that |s; ~ ss| <d for all kh, k > p. 

Prove further that there is a unique matrix s in E such that for any positive real 
number d there is an integer p for which |s, — s| <d provided k > p. (We 
naturally call s the limdt of the sequence s,, &, . . . , and it is defined to be e*.) 

4. Prove (9.6). 

5. Prove (9.7). 

6, If a(¢) is a differentiable matrix-valued function on R [a(é) being in # for all 
real numbers é], prove that 

d 

a 
limit being defined as in Exercise 3. That is, show that for any real d > 0 there 
exists a number p > 0 such that 


lina F [aid +) ~ a0) 


One-parameter groups and infinitesimal generators 377 


|@. 1 . 
gan, ae th — ally <a 


for all nonzero h such that —p <h <p. 
7. Prove (9.8). 
8. Prove (9.9). 
9. Find the solution of the system 
dxdt = 
dxzjdt = 2x, + ao 
such that (0) = a1, 22(0) = en 
10. Solve the system 
day/dt = tr + Ox, + Bas 
dx,/di = tx, + Pry 
dx3/df = try 
I. Solve 
dx/dt = a(t)-x +b) -y 
dy/jdt = bt) y tall) x 
where a(¢), b() denote given functions of t. 
12. Compute e*, where 


“8 


10. One-parameter growps and infinitesimal generators 


The exponential of a matrix gives an important method of determining commuta- 
tive subgroups of matrix groups. Indeed, it follows at once from (9.6) that the 
mapping 


wor fet aan n X n real or complex matrix 


is a homomorphism of the additive group of real numbers into the multiplicative 
group of invertible matrices. For by (10.1) we have 


si settne o gate = pia, gtw 


since (sa) - (la) = (ta) » (sa). 

Conversely, it can be shown that any homomorphism of the additive group of 
real numbers into the multiplicative group of invertible matrices can be expressed 
as (10.1) for a suitable matrix a, The image of such a homomorphism is called a 
one-parameter group, and the corresponding matrix a is called the infinitesimal 
generator of the one-parameter group. 


exampie The rotations about the origin of euclidean two-dimensional vector 
space form a one-parameter group p. The matrices of the elements of p taken with 
Tespect to any orthonormal base w, u, are 


378 Rings of operators and differential equations Ch, 12, See. 10 
(ce @ —sin @ 
sin@ cos # 
The calculation in Example 8, Sec. 9, shows that 
_ (ce @ —sin é 
sin® cos 8 


for all 6, where 


Thus a is the infinitesimal generator of our one-parameter group. The correspond- 
ing linear transformation z — ax of R® is accordingly referred to as an infinitesimal 


rotation. 


EXERCISES 


1. Define some one-parameter groups geometrically and find their infinitesimal 
generators. 
2. Let E be an n-dimensional euclidean vector space, and Jet B be an ortho- 
normal base in Z. Let a be a skew-symmetric matrix. 
(a) Prove that the linear mappings whose matrices with respect to B are e'", 
—« <¢t < , form a commutative group of rotations. 
*(b) Is the homomorphism ¢ — e* a monomorphism for any skew-symmetric a? 
3, For any matrices a and b, with b invertible, prove 
boleh = bab 
4. Prove: If the eigenvalues of a are \1, . . . , Aq, then the eigenvalues of e* 
are eM, 2. 2, en, 
5. Using the Jordan normal form (see the following chapter), prove 


(a) e = sim (I+2a . 


(b) Can you offer some reason for calling a the infinitesimal generator of the one- 
parameter group e!*? 


13 


The Jordan normal form 


1. Introduction 


In Sec. 9, Chap. 9, we saw that a linear mapping T:U — V of two finite-dimen- 
sional vector spaces can be described very simply in terms of bases. Namely, it is 
always possible to find bases {u,] and {v,;} for U and V, respectively, such that 
Tiu) =v @ =1,...,7; 7 = rank of T) and T(u;) = 0 fori > +r. In other 
words, the matrix ¢ = (t’,) of T relative to that base pair is a diagonal matrix 
(#; = 0 if ij). Moreover, in Chap. 9 we developed an effective method for 
finding such bases. 

These considerations are valid in particular for 2 mapping T: U — U of a vector 
space to itself. But they involve in general the simultaneous use of two different 
bases {u;} and {v;} in U. For many purposes that is very inconvenient. The 
present chapter is concerned with the problem of finding a single base {u,} for U 
which exhibits the action of T as simply as possible. Or to phrase the problem 
differently, we wish to find a base for U with respect to which the matrix of T has 
the most agreeable form possible. But observe that we have imposed a rather 
severe restriction in limiting ourselves to a single base for the analysis of T, rather 
than to a pair of possibly different bases. 

A partial answer to this problem has already been pointed out in Sec. 5, Chap. 11 
(see Theorem 5.5). We recall that a vector e ~ 0 in U is called an eigenvector 
of Tif 


4a Te) = pe 


for some scalar p (p is then called an eigenvalue of T, and e is said to “belong” to p). 
Now if T happens to have » linearly independent eigenvectors, say &1, @, . . . , €ns 
where nx = dim U, with corresponding eigenvalues pi, po, . . - , Pn, then we have 


12 Te) = pe G@=1,...,2”) 


and the e; form a base for U. The matrix of T relative to the base {e,} is the 
diagonal matrix whose diagonal elements are p;, Dz, . . . , Pr» The action of T 
in the case at hand can be described very simply as a “stretching” in the n direc- 
tions determined by e:, .. . , en; and the eigenvalues are just the stretching 
factors. 


380 The Jordan normal form Ch. 13, Sec. 1 


This situation can be described somewhat differently as follows: Each of the 
eigenvectors e; generates a one-dimensional subspace V; (consisting of all vectors 
zei, x any scalar), and from (1.2) itis clear that T maps V; into itself, Furthermore, 
U is the direct sum of Vi © » - + © V, (see Definition 5.2, Chap. 8). For every 
x in U can be expressed uniquely as a sum x = ale, + xe + --- +2", of 
which each term ze; (no summation!) is an element of Vy. 

In general the situation is not so simple, for a linear operator T need not have x 
linearly independent eigenvectors. For example, the mapping Q? — Q? given by 
the matrix 


() 


has only the vectors of the form 


(;) x any rational number 


as eigenvectors. We shall see that the problem we have set for ourselves leads us 
in general to a decomposition of U into subspaces analogous to the V; mentioned 
above, but in general of dimension greater than 1. 


2. Elementary linear mappings 


Here we shall catalogue some particularly simple types of linear mappings whose 
actions can easily be ‘‘visualized.” U denotes an n-dimensional vector space over 
some field K. 


Tyee» Let T denote the mapping T = al of U, where I is the identity mapping. 
Thus T(x) = ax for every x in U (hence every element of U is an eigenvector of 
T belonging to the eigenvalue a). T operates by stretching each element of U by 
the factor a, The matrix of T relative to any base is a - I, that is, 


a@ Orr. a 
Oars 0 
OO. @ 


The characteristic polynomial of T (Definition 5.1, Chap. 11) is (¢ — a)". The 
minimal polynomial} of T is # ~ a. Mappings of this type are sometimes called 
scalar mappings. 


vee u Suppose that T: U — U is a linear mapping for which there is a base 
ws, ..., ta} in U such that 


T(u) = wa G@=2,3,...,%0 
Tim) =0 
1 Thatis, the monic polynomial /(t) of least degree with coefficients in K such that f(T) = 


2a 


Elementary linear mappings 381 


We shall call such a T nilcyclic of order n, and a base B for which (2.1) holds will 
be called a cyclic base for T. Taking n = 4 for simplicity, the matrix of T relative 
to the cyclic base B is 


For any 2, the matrix of T has zeros everywhere except for a diagonal sequence of 
1's immediately above the main diagonal. The characteristic polynomial of T is 
quickly seen to be”. The minimal polynomial of T must divide the characteristic 
polynomial (Theorem 4.6, Chap. 11) and so must be a power of 4, It is not hard 
to see in fact that the minimal polynomial is also For, referring to (2.1), we 
have 


Fay) = we @=8,4,...,m) 
Tm) = Tu) = 0 


More generally, 


Ti(u) = w+ 
Tu.) 


f] 
o 


Hence T* = 0, but T’-! ~ 0, since Pe-'(u,) = un. 
Observe that the cyclic base B is obtained by applying T repeatedly to the one 
element u,, as follows from (2.2). 


tveem If T: U =U is such that there is a base B = {e,, . . . , e,} of U for 
which 


Te) = pes (@=1,..., 2; prin K) 


then we shall eall T a diagonal mapping. The equation here is the same as (1.2), 
and so our requirement is that T possess x linearly independent eigenvectors. The 
characteristic polynomial of ¢ is (¢ — pi)(t{ — po) - + + ( — p.). T, acting on the 
one-dimensional subspace spanned by e,, is of Type I, and therefore T ean be 
thought of as a sum of mappings of Type I, in a sense to be made more precise 
below, On the other hand, Type III obviously includes Type I. 

We shall see that mappings of Types I and II can be used as building blocks 
out of which we can construct the most general linear mapping. 


REMARK. T being a diagonal mapping, as above, we have T*(e;) = T(pie,) = 
piT(e;) = pe: By a simple induction, THe.) = pie; for k = 1,2, 8, .. . (ef. 
Theorem 5.9, Chap. 11). If T* = 0, then p* = 0 fori =1,2,..., , whence 


= 0, since the p; are elements of a field. Therefore, no 


Pi = pr = 
power of a diagonal mapping T can be zero unless T = 0. 


882 The Jordan normal form Ch. 18, Sec. 3 
EXERCISES 

1. Let T be a nileyclie mapping of order x. What are the characteristic and 
minimal polynomials of T*? Taking n = 5, write down the matrices of T’, T?, 
T' relative to a eyclie base B for T. 

2, Let T: l' > U be a linear mapping of an n-dimensional vector space, and 
suppose that T — pl is nileyclic. Let B be a cyclic base in U for this mapping. 
Compute the matrix of T relative to the base B. 

3. A linear mapping T is called nilpotent if T" = 0 for some integer m. Suppose 
that S, T are both nilpotent mappings of a vector space U to itself, and suppose 
that ST = TS. Prove that S + T is nilpotent. 

4. Let P,, be the vector space of polynomials of degree <x in a variable ¢ over 
a fied K. Let D: P, +P, be the differentiation operator. Show that D is 
nileyclie. 

5, Let T be a nileyclic mapping of order m, Compute the rank and nullity of 
T* (k = 0,1, 2, ete.) 

In the following exercises compute the matrix of f(T) relative to the specified 
base B: 

6. f(t) =P — 5t +1, T = cl, B any base. 


7. f(t) = & +1, T nileyclie, B a eyclic base of five elements. 
8 f® = +3, with T and B as in Exercise 7. 
% f) = 6 +56, with T and B as in Exercise 7. 


10. f@) = 8 ~ 52 4447, T and B as in Exercise 7. 

11. f® = 4, where T = cl +N, with N nileyclic and B a cyclic base for N. 

12. f() = #, T and B as in Exercise 11. 

13. #() = ®, T and B as in Exereise 11. 

14, Let T be a linear mapping of an n-dimensional vector space to itself. Prove 
that T is nilpotent if and only if the characteristic polynomial of T is y(t) = t". 


3. Direct sum decompositions 


Here we shall recall some things from Sec. 5, Chap. 8, and See. 5, Chap. 11 (Theo- 
rems 5.6 and 5.7). 


DEFINITION 3.1 Let Wi, .. . , W, be subspaces of a vector space U. Then U is said 
to be the direct sum of those subspaces if every element x of U can be expressed in one 
and only one wayasasumx =x tu t+ ++ +x, withninWi(i=1,... 7). 
If these conditions are fulfilled, then we write 
U=W,9W.@.--.- SW, 
examece1 Let U have finite dimension n, and let u, . . . , u, bea base. Denote 


by V; the one-dimensional subspace consisting of all vectors au; (a an arbitrary 

scalar). Since every x in U can be expressed uniquely in the form x = cm + 
> + 2°u,, of which the ith term z‘u,; is in Vi, it follows that U = V, 0 
» OV, 


Direct sum decompositions 383 


exampce 2 Let Wi, ... , W, be vector spaces over the same field K. Let U 
denote the set of all r-tuples x = (x, m2, .. . ,x,), within We =1,... 57. 
Define addition in U by (11, - 2.5%) + (Yo Yo YD = (+N, e+ 
Ys ...,X%-+y). Define scalar multiplication by e(x, =... x/) = (em, 
tk, ..., eX). With these definitions, U is a vector space over K (trivial verifi- 
cation). For fixed i, all elements of U of the type (0,...,%, ..., 0), with 
zeros except in the 7th place, form a subspace W; of U. U is the direct sum of 
Wi, ..., Wi, and W; is isomorphic to W,. 


THEOREM 3.1 Lef the vector space U be a direet sum of subspaces W,, Wo, . . . , We. 
Ifwy, Wn, . . . , Weare nonzero elements of Wi, Wa, . » W,, respectively, then they 
are linearly independent. Furthermore, if an arbitrary x in U is expressed as a sum 
R= tats ++ +x, withx: inW, fori =1,... , 1, then each x; is uniquely 
determined by x, and the mapping x — x; is a linear mapping of U onto W,, called the 


projection onto W;. 

Proof. For the first assertion, each W; contains 0, and therefore Defi- 
nition 8.1, applied to the zero element, says that if we express 0 as a sum 
0=a,+a+4+--- +a, with a; in W; for each i, then the a’s must all 
be zero. Now if aw: + - + - + ¢-w, = 0 for certain scalars ¢,, then, by 
what was just said, each term ¢,w; is zero, and so each ¢, is zero, because 
wi #0. 

For the second assertion, the fact that x; is uniquely determined by x 
ig just the requirement of Definition 3.1. Hence the operation that assigns 
x; to x is well defined. That is, x — x; is really a mapping of U to W,. 
It is routine to verify that it is a linear mapping. @.B.D. 


Let P;: U — U be the ith projection, as defined in the theorem above, but con- 
sidered as a mapping from U to itself rather than to the subspace W,. We can 
form the composition P; P; of any two of these mappings, and the result is easy 
to calculate. Thus, let x be an arbitrary element of U, and write it in the form 
xem+-e+-+x,+-+- +2, with each x,in W,;. Doing the same thing for 
any one of the x,, we get x; =O+-+- +x; +--+ +0 (all terms zero except 
the ith), by Definition 3.1. Then by definition we have P,(x;) = O if f ~ 7, and 
P,(x) = Pi(x:) = x Hence, Pj Pi{x) = Pj(x;) = 0 or x,, according as j + i or 
j=i. Thus 


0 ffs: 


op. -! 
aa PP tp tj ae 


In particular, P;? = P,. (In general, an endomorphism T such that T? = T is 
called idempotent. It follows that T? = T, ete.) 

The following proposition gives a useful criterion for a space to be the direct 
sum of two subspaces. 


384 The Jordan normal form Ch, 18, See. 3 


PROPOSITION 3.2 Let W, and W, be subspaces of a vector space U, and suppose that 
every vector in U can be expressed in at least one way as a sum of an element of Wi 
and an element of We. Then U is the direct sum of W, and Ws if and only if 0 is the 
only element common to both W, and We. 
Proof. By assumption, an arbitrary x in U can be expressed as a sum 
x =x, + %, with x: in Wy and x in Ws. Suppose that x = x, + x,isa 
second expression of the same type for x. Then 0 = x1 — x; + x: — x 
or x) ~ x = xi — x2. By assumption, x, and xj are in 171, and so x, — xj 
is in Wy. Similarly, x; ~ x: is in Ws. Therefore, the element y = x — 
x{ =x} — x is in both W, and Ws. If W, and W, have only the zero 
element in common, then y = 0, and consequently the two expressions 
for x must be identical. Hence U = W, 2 W2. Suppose, conversely, 
that U is the direct sum W, @ W», and let y be a nonzero element in 
both W, and W,, Then v = 0 + v =v + 0, and these are two different 
expressions for v of the type ¥ = v1 + ¥2 (vi in Wy, ws in Wy), contradict- 
ing Definition 3.1. 


corottary Let T:U —U be an idempotent endomorphism of a vector space. 

That is, T? = T. Then U is the direct sum of the kernel of T and of the image of T. 

Proof. Put W, = 1m T and W. = Ker T. If y is in both W, and 

Ws, then y = T(x) for some x and also T(y) = 0. Hence T%x) = 0. 

But T?(x) = T(x) = y; 30 y = 0. Hence Wi, W, have only the zero 

in common. For an arbitrary xin U, put x = T(x) and x = x — T(x) = 

x—xm. Then x =x +%, and x is in Wy, x is in W. For in fact 

T(%:) = Tix) ~ T?(x) = T(x) — T(x) = The assertion follows by 
Proposition 3.2 above. .E.D. 


Provosition 3.3 Let U be the direct sum of finile-dimensional subspaces Wi, W2, 


-. +4 W,. Let By denote a base for W, GG =1,..., 7). Then Br, By... , 
B,, taken jointly, constitute a base for U. In particular, dim U = dithW, +--+ + 
dim W,. 

Proof. The elements of By ..., B,, taken jointly, are linearly in- 
dependent. For let By = {u,..., usj, Be = {vi ..., ve}, ete. If 
these elements were linearly dependent, then there would exist a linear 
relation 

3.2 au fs baat, tom +--+ + bee Fo =O 
Put w, =a) + - + - + atm, We = by + - ~~ + bv, ete, Then the 
equation above is 
wtwets-:) tw =0 
From Theorem 3.1 we must have w, = 0, ¥ = 0,..., W, = 0. Since 


u, ..., a are linearly independent, w, = 0 is only possible if a; = 0, 


Direct sum decompositions 885 


+, % =, Similarly, from w, = 0 there follows >, = 0, ... , dy = 
0, ete. Hence all the coefficients in (3.2) must be zero. 

Then to show that all the elements in the bases B; form a base for U, 
we have only to show that they span U. If we write an arbitrary x in 
U in the form x =x + +--+ +X, with x; in Wy, then x; can be ex- 
pressed in terms of the base B; (i =1,..., 7). Hence x itself can be 
expressed as a linear combination of all the elements in Bi... , B,. 

QED. 


We now tie up the notion of direct sum with linear mappings. 


DEFINITION 3.2. Let T: U  U be an endomorphism of a rector spare U. A sub- 
space V of U is called T-stable if T maps V to V. 

For example, the subspace of U consisting of the zero element alone is T-stable 
for any T. 

If V is a T-stable subspace of U, then we denote by Ty the restriction of T to 
V. That is, Ty : V — V is the mapping defined by Tr(x) = T(x) for xin V. It 
is clearly a linear mapping. 


oeFinition 3.3 Lei Ty, Ts, . . . , T, be endomorphisms of a vector space U. Then 
T will be called the direct sum of Ty, ... , T. if 
33 T=T+---4+T, 


TcT, = 0  forimj 
To indicate this situation we shall sometimes write 
T=T106--: @T, 


exampte Let U =WiG--+ @W, and let P,---, P, be the projection 
operators of (3.1). From (3-1) it follows readily that 


I=P@G--- SP, 


The following theorem shows how linear mappings can be built up from map- 
pings defined on subspaces. 


PROPOSITION 3.4 Lei a vector space U be a direct sum U = Wi, © +--+ © W, of 
certain subspaces. For given i let Ti; W. > Ws be a Linear mapping. Then there 
is a uniquely determined linear mapping Ti: U > U such that Ti(x) = Ti(x) for 
x in Wy and Ti(x) = 0 for x in W; (i 2). If Ty is given for each i, then T = 


i +--+ +Ty isa direct sum; the subspaces W; are T-stable, and T is the unique 
linear mapping of U to itself whose restriction to W; is the same as T; for i = 1, 
1. We denote T by the symbol T, B- ++ @ T,. 


Proof. Let Pi: UU be the projection of (8.1). Define T; by 
Tx) = TAPi(x)) for any x in U. It is clear that T; thus defined has 


386 The Jordan normal form Ch. 13, Sec. 3 


the required properties and is uniquely determined by them. Further- 
more, it is evident that P;T; = T;P, = Tj. Hence 


oT, = (The Pi) o (Pie Ti) = Tre (Pho Pi) eT) = 0 ifij 


by (3.1). This shows that T = Tj + - - - + Ti satisfies the conditions 
of Definition 3.3. The remaining assertions are trivial to verify. @.E.D. 


REMARK. The mapping T of the theorem can be denoted by any of the three 
symbols T, © +++: OTT) O- ++ GT Ti +--+ + + Trin aecordance with 
our conventions. 


The following theorem is a kind of converse to the preceding: 


Proposition 3.5 Let T: U — U be an endomorphism of a vector space, and let U 

be a direct sum U=W, +--+ OW, of T-stable subspaces. Then for each i 

there is a unique linear mapping T;: U — U such that Ti(x) = T(x) for x in Wi 

and T(x) = 0 for x in W; (j # i). Furthermore T is the direct sum T = T, S 
+ ST. We call T, the W; part of T. 


Proof. Let Pi, ..., P, be the projection mappings of (3.1). Define 
T, by T; = Te Py. Since W,is T-stable, we have To P; = Py T. There- 
fore 


TioT, =(TeP)o(TeP) = ToT PoP =0 ifinj 
The other assertions are trivial to verify. 9.E.D. 


EXAMPLE 3 Let T be a diagonal mapping of an a-dimensional vector space U 


(Sec. 2, Type III). Let B = {e), . . . ,€,} bea base for which the matrix of Tis a 
diagonal matrix, say Tie.) = pies (i = 1, ..., 2). Let Vy denote the subspace 
of U generated by e Then V;is T-stable. Denoting by T;: Vi — Vi the restric~ 
tion of F to the elements of V;, then T=T, ®- ++ ST, by Theorem 3.4, We 
can equally well write T=; © --- @ Ti, where T; is the V, part of T (ef. 
Theorem 3.4). That is, Px) = T(x) for x in Wi, and Tix) = 0 for x in W; with 
wa 


Hence, a diagonal mapping is a direct sum of mappings of Type I. This gives a 
precise meaning to the observation at the end of the paragraph of See. 2 concerning 
mappings of Type III. 

The following proposition has occurred as a part of Theorem 5.7, Chap. 11. 


PRorosition 36 Let T: U > U be a linear mapping of a finite-dimensional vector 


space U, and let U be a direct sum of T-stable subspaces Wi, ..., W. Let By 
denote a base for W, (i =1,..., 7), and lef B = {Bi Bo, . . . , B,} be the base 
for U consisting of all the vectors in the B,, in the indicated order. Let T, be the re- 
striction of T lo Wi, so that T = T, & - + + & T,, and let a; denote its matrix relative 


to the base B,. If a denotes the matrix of T relative to the base B then 


Direet sum decompositions 387 


fa 


ay 


(This symbolism means that a consists of the square matrices a... , a 
arranged along the diagonal, with zeros elsewhere.) 
Proof. Reeall first from Proposition 3.3 that B is in fact a base for U. 
If B; = {m, ..., anf, say, then the effect of T on these elements is the 
same as the effect of T;. Thus T(m), . . . , T(u,) can all be expressed as 
linear combinations of wu, » U, by means of the matrix a; Conse- 
quently, the column of a in (8.4) corresponding to u, (for example) will 
have nonzero entries only in those rows corresponding tom, . . . , Us, and 
in those entries it will have the same elements as in the first column of a). 
The same is true for the other columns. @.E.D. 


exampte 4 Let U be a vector space with base B = {e1, @, es, e}, and let T be the 
linear mapping of U determined by 


Tle) = 8a 
Te) = —2e: 
Tes) = e 
Tie) = es 


Let V, be the subspace generated by B, = {e;}, let V> be the subspace generated by 
By = {e:}, and let V3 be the subspace generated by B; = fe, e:}. Plainly we have 
U = V; ® V2 W Vs, and the three subspaces are T-stable. Letting T; denote the 
restriction of T to V,, we have T = T; @T: STs. The matrix a of T relative 
to Bis 


ay 


ay 


where 


a=) a= (-2) a () ») 


are the matrices of T,, T;, T, relative to By, Bs, Bs, respectively. 


EXAMPLE 5 Referring to Example 3, the matrix of Ti relative to the base B, = {e,} 
of V, is the 1 X 1 matrix (p). The matrix of T =, @-- - @ T, relative to 
the joint base B is the diagonal matrix 


388 The Jordan normal form Ch. 18, See. 3 


Pr 


Px, 


in conformity with Theorem 3.6. 


EXERCISES 


1. Let T be an endomorphism of a vector space U, and let U be the direct sum of 


T-stable subspaces Wi, ..., W,. Let Ts denote the restriction of T to W.. 
Prove that 

Ker T = Ker T) ® -- - © Ker T, 
and 

ImT=Imt,o--- @ImT, 

2, Let T, Ti, . . . , T, be endomorphisms of a vector space, and suppose that 

T=T,4--+ @T, (direct sum). Prove that T= @--- OT for 
k = 1,2,3, ete. 


3. Let U be a vector space with base {e,, e}, and let T be the nileyclie mapping 
such that T(e;) = @ and T(e:) = e;. Let V be the subspace generated by 1. Is 
V a T-stable subspace? Does there exist a T-stable subspace W such that U = 
vew? 

4. Let T; U — U be an endomorphism of an n-dimensional vector space, and 
let V be an r-dimensional subspace. Let B = fe, ..., en} bea base for U, and 
suppose that A = fe, ..., @,| isa base for V. Prove that V is a T-stable sub- 
space if and only if the matrix of T relative to B has the form 
That is, the lower left-hand corner is the (n ~ r) x r zero matrix. 

5. Let T:U — U be an endomorphism of a vector space, and let V be a T-stable 
subspace. If T has an inverse, show that V is stable for T-! and that (T-)y = 
(T,)~'! (restrictions to V). 

6. Let T:U —+ U be an endomorphism of a vector space which is a direct sum of 
T-stable subspaces Wi, .. . , W,. Let T; denote the restriction of Tto W;. Show 
that T has an inverse if and only if each T,:W; + Wis an isomorphism. If that is 
so, show that 


THaT1 Oo... OT 
7. Let T, Ti, .. . , T, be agin Exercise 6, and assume that U is of finite dimen- 


sion. Prove that 


Nilpotent mappings 889 


det T = (det T;) - - - (det T,) 
8 The mappings T, T,, . . . , T, being as in Exercise 6, let /(/) be a polynomial 
in a variable ¢ with coefficients in the field of scalars. Prove that 
A(T) = F(T) B+ + OFT) 
if 0) =0. 
9 The mappings being as in Exercise 7, let g(f), olf), .--. ¢-(é) be their 
respective characteristic polynomials. Prove that 
ef) = el «+ el) 


10. Let T be an endomorphism of 2 finite-dimensional vector space U, and let x 
be a nonzero vector in U. Let & be the largest integer such that x, T(x), ... , 
Té-\(x) are linearly independent. Show that they generate a k-dimensional T- 
stable subspace V of U. Show that there are uniquely determined scalars do, 

= 1 such that T(x) = aox + + - + +a, 1Té'(x), and prove that the char- 
acteristic polynomial of Ty is f — a,yf' — - ++ — ait ~ a, 


4. Nilpotent mappings 


An endomorphism T: U — U of a vector space is called nilpotent if T’ = 0 for some 
integer k. For example, nilcyclic mappings are nilpotent (Type II, Sec. 2). Buta 
nonzero diagonal mapping is not nilpotent (see the Remark at the end of Sec. 2). 
A direct sum of nilpotent mappings is nilpotent—in particular, a direct sum of 
nileyclie mappings is nilpotent. For let T: U — U be the direct sum T = 7, © 
+ @T, of endomorphisms of subspaces Wi, ..., W,, respectively, where 
U=W,O---+ @W,. Suppose that TY = Ofori =1,... , 2, the k; being 
certain integers. Let k be the greatest of these integers k; Then T,’ = 0 for 
i=1,...,8 plainly. We claim that T = 0. 
To show this, let Ti (i = 1, . . . , r) be the W; part of T (cf. Proposition 3.5). 
Thus, Ti(x) = Ti(x) for x in W, and Tix) = 0 for x in W; with j #7. Clearly 
TY = 0, since T = 0. From (3.3) we have 


Sy = Te +--+ TF 


Eres 


T-(+--- 47 


since the terms T/T; with 7 ~ j disappear. By a simple induction we find similarly 
that 


p= 


Pos TH 30 


as claimed. 

A direct sum of nilcyclie mappings is nilpotent, as was just shown, The main 
purpose of this paragraph is to show that a nilpotent mapping of a finite-dimen- 
sional vector space is the direct sum of nileyelic parts. We require the following 
result: 


390 


The Jordan normal form Ch. 18, See. & 


THEOREM 4.1 Lei T be @ lincar mapping of a vector space U, and tet V be a T-stable 
subspace. Then there exists a vector space U® and two linear mappings S: U + U® 
and Ty: U® — U? such that 


KerS = V I, 8 = UW and ST=T%)S 


This theorem has already been proved in Sec. 10, Chap. 9. The desired U®, S, 
and Ty are given by 


aa 


a2 


U = U/V 
To = Tow 


where U/V is the quotient module whose elements are all the different cosets 
x +V.8: U3 U/Vis the projection 


43 


xox+V 


and Ty,y, the U/V part of T, is defined by 


44 


Tevix+V oT +V 


for any coset x + V. 
The relation ST = T,S can be depicted by the diagram 


T 
y su 
8] s 
t 1 
teu 
Toy 


We now come to the main result of this section. 


THEOREM 4.2 Every nilpotent mapping of a finite-dimensional vector space U is the 
direct sum of nileyclic mappings of ceriain subspaces, and U is the direet sum of 
those subspaces. 


Proof. We prove the theorem by induction on dim U. If the dimen- 
sion is 1, then the assertion is plainly true. For let e be a base for U 
(that is, any nonzero vector). Then T(e) = ae for some sealar a. If 
T is nilpotent, say T" = 0, then a™ = 0, and soa = 0,T = 0. The base 
e is a cyclic base for T. 

Suppose then that the theorem holds for vector spaces of dimension 
<n, where n is an integer greater than 1, and let dim U =n. Farther, 
let m be the smallest integer such that T” = 0. Then T"~ # 0, and so 
there is a vector x in U such that Te—'(x) #0, Sete. = Tr-{x), er = 
T(x), 0.2, Qn = T(x), em = x. The vectors in the set B = fer, 

. , @m} are linearly independent. For suppose that ae, + +++ + 
Gn@m = 0 for certain scalars a, .. . , @m That is, 


Nilpotent mappings 391 
T(x) bo femal (x) + a,x = 0 


Applying T"~' to this and keeping in mind that T = 0, we get anT"—(x) 
= 0, whence aq = 0. The equation is therefore 


T(x) + 0+ + dni T(x) = 0 


Applying T"-? to this equation, we obtain @~1T"—'(x) = 0, and so 
@,—1 = 0. Continuing in this way by a simple induction, we conclude 
that all the a’s are zero. Let V denote the subspace of U generated by 
the set B. Then V is clearly T-stable, and the restriction of T to V is 
nileyclic; B is a cyclic base for V. The following observation is important: 
(A). If THy) is in V for some element y of U, where 0 <k < m, then 
there is an element y' in U such that T'(y’) = 0 and such that y’ — y is 
in V. 

To see this put z = Ty). Then z is in V and T*-*(z) = 0, since 
T*(y) = 0, Expand z in terms of the cyclic base Bin V, say, 2 = ce: + 

+ +¢m@m- Now T"-* sends e; into zero if j < m — k, and it sends 
e; into ej; «mx if 7 > m — k. Applying T~* to both sides of the equation 


we obtain 

O = T(z) = Crngrer + + + cme 

from which it follows that ¢ncyi,---» ¢m must all be zero. Hence 
B= +--+ ten i@m Define 2’ by the formula 2’ = cecgi + 


+ + Cpoiem Then clearly Té(z') =z. Putting y’ = y — 2‘ we have 
finally Tey’) = Ty) — Ta‘) = 2-2 = 0, and y’ —y = —2! is an 
element of V, as desired. 

Since V is T-stable, we can apply Theorem 4.1 to the situation at hand. 
The theorem guarantees the existence of a vector space U® and of linear 
mappings S and T, such that Ker S = V, Im S = U*, and ST = T,S. 
From this last equation we have 


SoT? = (SoT)oT = (Te S)oT = Toe (ST) = The (Toe S) = Tee S 
Similarly, by a simple induction, we have 
SeTt = TiHeS 


for k = 0, 1, 2, 3, ete. Putting k = m here, we see that Ty"oS = 0, 
since T” = 0. But S maps U onto all of U®, and it follows at once from 
Ty"S = O that Ty" = 0. 

Now dim U = dim Ker S + dim Im 8 = dim V + dim &. Since 
dim V = m > 0, we must have dim U® < dim V. If V = U, then T 
itself is nileyclic, and we are done. If V # U, then dim U® > 0. As just 
observed, the mapping Ty of U° is nilpotent. Therefore, by our induction 


392 


45 


a6 


The Jordan normal form Ch. 13, See. 4 


assumption, U? is the direct sum U° = W. ® - - + GW of Testable 
subspaces such that the restriction of Ty to each Wis nileyclie. Conse 
quently each W'? has a cyclic base B,°, obtained by applying T, repeatedly 
to a single element u°. That is, if m; = dim W'S, then B,? consists of the 
elements 


Ty'(u) G=01,...,m—-V 


(taken in the order of decreasing ); and Ty"i(u,*) = 0. Since S maps U 
onto U*, for each i there is an element u, in U such that S(a,) =u. 
Since So T*(u;) = Ty"(S(u,)) = Te(u.®) = 0, it follows that T™(u) is 
in the kernel of S, namely, in V. Moreover, u; can be altered at will by 
adding to it any element of V. We conclude from (A) above that u: 
can be chosen in such a way that 


Tu) = 0 @-...,9 
Now let B; be the set of elements 
Tia) G=01...,m-h 


taken in the order of decreasing j, and let W; be the subspace of U gen- 
erated by B,. From (4.6) and (4.7) it is clear that W; is T-stable. We 
must show that U = Wi ® --- © W, @ V, and to do so we have only 
to show that B,, ..., B,, B, taken jointly, form a base for U. Then 
let y be an arbitrary element of U. By assumption and Proposition 3.3, 
the element S(y) in Uy can be expressed as a linear combination of ele- 
ments in By’, ..., B,°. In that linear combination let us replace the 
elements (4.5) of B;° by the corresponding elements (4.7) of Bs, The result 
is an element y’ of U such that S(y’) = S(y), clearly, since S maps B, 
into B®. Thus S(y’ — y) = 0, and so ¥ = y’ —y is an element of V 
and can therefore be expressed as a linear combination of the elements 
of the base B. Hence y = y’ +¥v is a linear combination of elements 
in B, ..., By, B. The latter elements are linearly independent, as 
follows easily from similar reasoning (that is, by applying S to any non- 
trivial relation among them). 

Writing W.i1 for V and denoting by T; the restriction of T to Wi 
G=1,..., 741), we have 


THT 5+ ++ OTs 


Furthermore, each T; is nileyclic, and B, (or B in the case of W,,)) isa 
cyclic base for W;.  Q.E.D. 


exampce Let T be an endomorphism of U whose matrix with respect to the base 
A = fun up, us, uy} is 


Nilpotent mappings 393 


2  -8 12 —60 
2 -5 9 —48 
6 -17 29° —152 
1 -3 5-26, 


By direct calculation the characteristic polynomial of T is found to be ('. Hence 
T* = 0,and so T is nilpotent. It is easily verified, using the matrix, that T? = 0. 
In the notation of the proof of Theorem 4.2, the integer m here is then 2, Follow- 
ing the proof of Theorem 4.2, we take any vector x such that T(x) + 0, and con- 
sider the subspace generated by x and T(x). Let us take x to be wy, and write 
e =m and e; = Ti). That is, 


ag eg = uh 


e: = 2m, + 2ue + Gus + wy = Tes) 


This pair of vectors generates a two-dimensional T-stable subspace Vi, and they 
form 2 eyclic base for T restricted to Vi. 

If we can find another vector y such that y, T(y) and e;, e are all linearly in- 
dependent, then we shall have the desired decomposition of T. Let us try us. 
Write 


49 ey, = Us 
e; = —60u, — 48u, — 152u; — 26u, = Ties) 


It is easy to see that (4.8) and (4.9) can be solved for u, . . . , ws in terms of 
B= |e... , e}, 80 that the latter is a base of U. The matrix of T relative 
to Bis 

01 |° 0 

ooo 0 

0 0/0 1 

0 0/0 0 


Denoting by V: the subspace spanned by e; and e,, we have U = V; @ V2. De- 
noting by T; the restriction of T to V., we have T = T; & Ts, and T, and T; are 
nileyelic. 


We shall abstain from giving an example exhibiting the various complications 
arising in the computations connected with Theorem 4.2. Since the proof of 
Theorem 4.2 is inductive in nature, it carries with it more or less explicit instruc 
tions for performing the calculations. In the notation of the proof, starting from 
U we obtain an m-dimensional subspace V generated by x, T(x)... , T®-'(x). 
It V = U, then we are done. Otherwise, the induction step required involves 
replacing U by a lower dimensional subspace U* equipped with two linear map- 
pings $: U + U® and To: U°— U" such that Ker § = V and SeT =ToeS. 
Then Tp is again nilpotent (assuming T is), and we can start afresh with Ty and 


394 The Jordan normal form Ch. 18, See. & 


Us, repeating the same argument for them. Only a finite number of reduction 
steps of this type are necessary, since the dimension drops at each step. 


EXERCISES 


1. Let T be a linear mapping whose matrix with respect to a base [e:, es} is 


0 9) 
(; 0) 
Prove that T is nileyelic, and find a cyclic base. 
2. Let T be a linear mapping whose matrix with respect to a base {e, es, es} is 


0 a b 
00 :) 
O00 9, 


Prove that T is nilpotent. Under what circumstances is it nileyclie? 

3, Let a linear mapping T have a triangular matrix with respect to some base 
fe... , nf, and suppose that the diagonal entries are all zero. Prove that 
T is nilpotent. 

4, Prove the converse of Exercise 3. That is, if Tis a nilpotent mapping of a 
finite-dimensional vector space, then there is a base relative to which the matrix 
of T has the desired form. 

5. Let T be a nilpotent endomorphism of an n-dimensional space U, and let 


U=W,@--- ©W, bea direct sum decomposition into T-stable subspaces, 
on each of which T is nileyclic, Prove that 
r= dim Ker T 


[Hint: What is the dimension of Ker T,, where T; is the restriction of T to W.2] 
6. Let T be nileyclic of order m. What is dim Ker T? 


7. Hypotheses as in Exercise 5. Let m, mx, ..., 2, be the dimensions of 
W,,..., W,. Prove that for any integer h > 1 the number of integers in the 
set m1, ..., 7% which are > h is equal to 


dim Ker T' — dim Ker T!! 
[Hint: Use Exercise 6 above and Exercise 1, Sec. 3.) (Taking & = 1 here we get 
the result of Exercise 5.) Conclude from the result above that r as well as the set 


of integers m, ... , ”, are uniquely determined by T and do not depend upon 
the particular decomposition into nilcyelie parts. 
8. Let U, V, U°, T, S, T. be as in Theorem 4.1. Prove that if fu, ..., ua} 


is a base for U, then there is a subset of them which is carried by S into a base for 
uw, 
Taking x = 5, let V have the vectors 
e1 =u) — 2m, + uy + Us 
2 = 2h — Us + day 
@; = Uy — Uy + Us 
as base. Show that S(u,) and S(us) form a base for U". If T has matrix 


Characteristic subspaces 395 


f—-5 1 1 0 3 
1 1 4 1 0 
-2 -1 -4 0 0 
-1 1 12 1 
5 3 138 0 t 
relative to the given base {u:, ... , us}, show that V is T-stable, and compute 


the matrix of T° relative to the base {S(u,), S(us)}. 


5. Characteristic subspaces 


Let T: U— U be a linear mapping of an n-dimensional vector space over a field 
K; and let g(t) be the characteristic polynomial of T (cf. Sec. 5, Chap, 11), We 
recall that if a is the matrix of T relative to any base in U, then 


o()) = det (A — a) 


and g(t) is a monic polynomial of degree n = dim U (see Theorem 4.2, Chap. 11}. 
Throughout Secs. 5, 6, 7 we shall assume that all ihe roots of g(t) are in K. More 
precisely, we shall assume that ¢(¢) splits into a product of factors of degree 1, 


say 
Ba ef = -— mE — pe) > > Pad 


all the p; being in K. If our field of scalars is the complex field, then the assump- 
tion is automatically verified, by virtue of Theorem 4.2, Chap. 6. Naturally 
some of the p; in (5.1) may be repeated several times. Suppose then that p,, p:, 

- » Pr are the distinct roots of p(t), Gathering repeated factors in (5.1) to- 
gether, we can write it as 


2 ot) = Gp) @— py Epa 


where m;, m2, ... , 7%, are positive integers such that m + --- +”, =n. We 
recall that the roots p; are called the eigenvalues of T. 


DEFINITION 5a The kernel of the operator (pil — T)* is called the characteristic 
subspace of U belonging to the eigenvalue p; We denote ti by Vi. Hence, x is in Vi 
if and only if 


(T — pil(x) = 0 


V; is indeed a subspace of U, for the kernel of any linear mapping is a subspace. 
Moreover, V; is T-stable, To see this we observe that 


a3 Te (T — pT = (T — pa Do T 


396 The Jordan normal form Ch, 13, See. 5 
which follows at once from the polynomial identity 
ut — pd = E — payee 


and the fact that the substitution {> T preserves polynomial identities (see 
Theorem 4.4, Chap. 11). Then if x is in Vi, we have (T — p,1)™(x) = 0, by defi- 
nition, and so T(T — p: 1)” (x) = T(0) = 0. From (5.3) there follows 


(T ~ p:D™T(x) = 0 


showing that T(x) is in Vi. Hence T maps V; to itself. 


It is convenient to associate with y(t) the polynomial f(t) @ =1,..., 7) 
defined by 
Ba FQ = ES ay Epica — pei os (B= pe 
Then clearly 
85 ef) = b — py hh 
and fit), f(Q, . . +. f-(f) have no common factors other than constants. That 


is, their greatest common divisor is 1. Therefore there exist polynomials a(t), 
. , @,(f) with coefficients in K such that 


5.6 1=fA@aO +--+ +£0a 


(See Theorem 8.4 and Exercise 8, Sec. 3, Chap 6.) 
The main result of this section is the following theorem: 


THEOREM 5.1 Let T be a linear mapping of an n-dimensional vector space U to itself, 


and suppose thal the characteristic polynomial of T is e(8) = (6 — pi)" - + - ( — pet 
where pi, .. . , pr are the distinel eigenvalues of T. Then U is the direct sum U = 
Vi D +++ GV, of the characteristic subspaces, These subspaces are T-slable; and if 


1 denoles the restriction of T to V;, then the characteristic polynomial of T; is (E — pi)" 
Finally, dim Vi = ni. 
Proof. We have already shown that the characteristic subspaces Ui 
are T-stable. Now let x be any vector in U. To (5.6) let us apply the 
substitution homomorphism that replaces ¢ by T. The result is 


87 T= A(T) a(T) +++) +4(T)oa(P) (Theorem 4.4, Chap. 11) 
Hence 
x = I(x) = A(T) © ay(T)(x) + > + + + CP)» a(T) (x) 
Write 


58 X= SAT) 0 a(P)(X) 
so that the preceding equation is 


59 xem bers +k 


Characteristic subspaces 397 


5.10 


Sal 


Now apply operator (1 — pil)* to (5.9). From (5.5), 


eT) = (1 — palo f(T) 
whenee, by (5.8), 
(T = p(x) = (TP) © ac(T)(x) = 0 


by the Cayley-Hamilton theorem (Theorem 5.2, Chap. 11). Therefore 
x; is in the characteristic subspace V;. We have shown that every vector 
x in U can be expressed in at least one way as a sum (5,8) of elements 
in the V,, We must show now that the expression is unique. Let x = 
xi-+ +++ 4x! be another expression of the same type, that is, with 
xiin V;, Subtracting the two we get 


O=yters ty 


where y, = x,’ — x; is again in V;. We have only to show that the y; are 
all zero. 

If v is any vector in V;, then (T — p,Ij"*(¥} = 0, by definition. Now 
the polynomial f,(t) contains the factor (f — p,)" ifj = i, and so 


f(T) = 0 ifvisin Vi andi #3 
Then from the identity (5.7) we find that 
v=f(T)ea(T)(v) for vin Vy 


since f(T) °a,(T)(¥} = 0 forj #7 


by (5.10). 
Now apply the operator Ji(T) ¢ a,(T) to the equation 


Oman te-+ +¥ 
From (5.10) we obtain 
0 = F(T) » a(T(y) 


and the right-hand side is equal to yi, by (5.11), Thus y, = 0, and we 


have proved that U = Vi © - ++ © V,, and therefore 
T=T.0-.-. ST, 


by Proposition 8.5, where T;: Vi > Vi is the restriction of T to V:. Ob- 
serve that (8.11) says that 4,(T) is the inverse of /:(T) on the subspace Vy. 
That is, 


a(T) = f(To7 


398 


The Jordan normal form Ch. 18, See. 5 


The operators 
FAT) ¢ a(T) 


are precisely the projection operators P; discussed in connection with (3.1). 

To complete the proof of our theorem, we start with the fact that 
(T; — pa) = 0 on V, (here I denotes the identity on V.). Hence the 
characteristic polynomial of T; must be (t — p,)* for some integer ki. 
This follows at once from Theorem 4.6, Chap. 4. For the minimal poly- 
nomial of T; must divide (¢ — p,)"i and must therefore be a power of 
!—p,, The minimal and characteristic polynomials of T, have the same 
irreducible factors—namely, # — p;—and so the characteristic polynomial 
of T; is also a power of f — p;. By Theorem 5.7, Chap. 11, we have 


oO =@— poh Wp 

the product of the characteristic polynomials of the T;. Comparing this 
with (5.2) we conclude that ky =m, ... ,k, =m. QED. 

EXERCISES 


In the following exercises T denotes a linear mapping of a vector space to itself, 
and mata T denotes the matrix of T relative to some base B. 


1. 


Find the characteristic subspaces of T if 


-1 0 
ite T= 
mast (“1 _°) 


where B = {e, ev. 


2. 


Find the characteristic subspaces if 


35 
ty T = 
mate = (5 8) 


. Find the characteristic subspaces if 


ou 
ite T = 
mot (22) 


Find the characteristic subspaces if 


3 5 0 
mateT=[0 3 6 


Oo 0 2 


. Find the characteristic subspaces if 


3 5 0 
mateT=[0 3 7 
0 0 8, 


Find the characteristic subspaces if 


5 1 -1 
mats T =[0 2 L 


The Jordan normal form 99 


7. What are the characteristic subspaces of a nilpotent mapping? 

8, Let T be a linear mapping of an n-dimensional vector space U over a field K, 
and let p be an element of K. Let V, denote the kernel of (TF — pl)". 

(a) Prove that V, # 0 if and only if p is an eigenvalue of T. 

(8) Prove that V, = Ker (T — pl)”, where mis the highest power of  — p that 
divides the characteristic polynomial of T. 

(c) Prove that dim V, = m. 

(d) Let T’ denote the restriction of T to V>. Prove that the characteristic poly- 
nomial of TY is (¢ — p)”. 


6. The Jordan normal form 


Again let T: U — U bea linear mapping of an n-dimensional vector space U over a 
field K, and assume that the characteristic polynomial g(t) can be factored com- 


pletely: 

6a el) = p+ (pelt es (= pl 

where pi, pz, . . . , p, are the distinct eigenvalues of T, all in the field K. Again let 
62 Vi = Ker (T — pil) 


be the characteristic subspace belonging to p;. By Theorem 5.1 we have 


63 U=V5--- OV, 
and 
oa T=T,&-.-- @T, 


where T,:V,; — V; is the restriction of T to the stable subspace Vj. 

By (6.2} we have (T; — pil)" = 0on V,, and therefore we can apply the results 
of Sec. 4 to the nilpotent operator T; — pil. By Theorem 4.2 the subspace V; can 
in turn be split into a direct sum of subspaces 


6s VieWa @--+ Wu, 


of a certain number k, of (T; — pil)-stable subspaces, on each of which T; — 7.1 is 
nileyelic. The W; are in fact T-stable, for let x be in W;;. Then (T — pI)(x) = 
Ti(x) — pix isin Wy; since W,; is stable for T; — pil. But p.x is certainly in Wi, 
and therefore T(x) must also be in W,;, being equal to p.x plus an element of W;;. 
Thus, denoting by T,; the restriction of T to W.;, we have 


6.6 T.=T1 O +++ OTx, 


by Proposition 3.5. 
Now let us determine the matrix of the mapping Ty of Wi. Since T,; — pill is 
nileyclic, there is a cyclic base Bi;for W,;. For simplicity of notation, suppose that 


400 The Jordan normal form Ch. 13, Sec. 6 


Bi; = fe. es, . . . ,g}. This isa cyclic base for the operator T,; — pil, and there- 
fore (cf. Sec. 2, Type IT} the matrix of T,; — pI relative to that base must be the 
q X q matrix 


010 --++ 0 

001 0 
Pn carey 

000 1 

00 0 O} 
Now 


Tis = pil + (Tes — pil 


and consequently the matrix of T,; relative to the base in question is equal to 
pi X identity matrix plus the matrix (6.7), That is, denoting by maty,,T,; the 
matrix of T,;, we have 


L 
Di 


consisting of p, along the main diagonal and 1’s on the diagonal just above the main 
diagonal. 


For each W:, we havea cyclic base B;;(forT; — pil). Let Bi = {Bay .. . Bur} 
be the joint base for V; obtained from the bases of the various subspaces Wn, . . . 5 
Wi, (Proposition 3.3). Similarly, let B = [Bi, . . . , B,} be the base for U ob- 


tained by joining the bases B; of the characteristic subspaces V;. Then by Prop- 
osition 3.6 we have 


mata, Ti 


mats, Tr 


63 mate T = — 
mats, T, 


where mata T stands for the matrix of T relative to the base B, etc., and where the 
entries in (6.9) off of the diagonal blocks are all zero. 
In turn, from (6.8) we have 


mate, Ta 
610 mata, T; = an 
mate,,, Ti 


with zeros off the diagonal blocks. 
We summarize our findings in the following theorem: 


The Jordan normal form 40t 


THeorEM 61 (Jordan normal form theorem) Let T be a linear mapping of a 
finite-dimensional vector space U to itself, and assume that the characteristic poly- 
nomial of T has all its roots in the field of scalars. Then there is a base B in U with 
respect lo which the matrir of T has a diagonal block form in which each block is of the 
type (6.8). 

As might be expected, the process of determining the promised base B for which 
the matrix of Tis in Jordan normal form is rather tedious. However, the procedure 
is reasonably straightforward. 

Suppose, as is usually the case, that T is specified by means of its matrix relative 
to some base A in the vector space U. The first step is to compute the character- 
istic polynomial y(¢) of T from that matrix and to determine the roots of g(t). 
Suppose that has been done, so that we have g(t) in the form (6.1). 

The next task, rather arduous, is to compute the characteristic subspaces V,. 
Recall that V; = Ker S;, where 8; stands for (T — p.1)":, Todo this we proceed as 
follows: First compute the matrix of 8; relative to the given base A. It is 


mat, S; = (mata T — pi” 


Only simple matrix operations are required for this step. To determine Ker S; we 
use the method of Sec. 9, Chap. 9. Thus by a series of elementary row and column 
operations on mat. 8: we can determine a pair of bases AS and A‘ in U with 
respect to which S; has a diagonal matrix. The kernel of S, can then be read off 
immediately: V; is generated by the elements of A! which are mapped by S; into 
zero, aud they are the elements that correspond to zero diagonal elements in the 
matrix of S; relative to the base pair Al, A‘. 

The subset of A! just described is a base for V,, call it Aj, It is easy to compute 
how T operates on the elements of A,, and doing so we obtain the matrix of T, rela~ 
tive to Aj, where T; is the restriction of T to the stable subspace V,. 

The last step is the decomposition of the nilpotent operators T; — pj into nil- 
eyelie parts by the method described in See. 4. 


examere Let T: > U have the following matrix with respect to a given base 
A = [uy, te, Us, Ws): 
2 1 -6 -6 
6 2 0 0 
8 -1 5 6 
3 1 -6 -7 


mat, T = 


A simple calculation gives us the characteristic polynomial: 
ef) = +1" — 2)? 
The characteristic subspaces are therefore 


VY, = Ker (T+? V; = Ker (T — 21 


402 The Jordan normal form Ch. 18, See. 6 


For the first operator we have 


ta (T+ De = 
mata (T + I) -9 -6 18 18 


9 6 -18 —18. 


obviously of rank 2, It is a trivial matter to reduce it to diagonal form by the 
method of Sec. 9, Chap. 9. Thus, following (9.1), Chap. 9, introduce the base 
A” = fv, ..., va} defined by 


Y= Us + 
v= Wy (Gj = 2,8, 4) 


It is quickly seen that the matrix of (T + I)? relative to the base pair A, A" (that is, 
T(u,) expressed in terms of the vj] is 


9 6 -18 —18 


0 9 0 0 
0 9 0 0 
0 0 0 0, 
in conformity with the general rules of Sec. 9, Chap. 9. Now following (9.6), 
Chap. 9, define the base A’ = {uj, .. . , ui} by 
wl =m, ue = — 24m + ue, uy = 2m + us, we = 2m + 


Then the matrix of (T + I)? relative to the base pair A’, A” is 


9 0 0 0 
09 0 0 
9090 0 
000 9 


It is immediate that Ker (T + I) is generated by uy, ux. Thus 


2u + Us 


V, has the base { 
Quy + Ws 


In an entirely analogous way we calculate 
ti) 


mata (T — 21)? = 


oy 
ecco 
I 
© 
1 
me 
oo 


-9 
and from this 


wm —u; + uy 


V» has the base { 
Dy 


The Jordan normal form 403 
Denote the four vectors just obtained bye, .. . , e. Thus 
= Quy tuys ee = 2 pe ey Sp ey = 


A simple calculation shows that T + [ maps e, and e into zero. That is, T(e.) = 


—e, and T(e:) = —e: and e, e: are eigenvectors belonging to the characteristic 
root -1. Again a trivial calculation shows that T — 2I maps e, — e, and e; — 0. 
Hence the matrix of T relative to the new base B = |e, e2, @:, ey} is 


-1 0 0 ii) 


matst=( 9 l=} 0 0 
o of2 tf 
o ole 2 


This matrix is indeed in Jordan normal form. 
We now require the following result: 
THEOREM 6.2 Lei T:U — U be a linear mapping of a finite-dimensional vector space 
over a field K. A necessary and sufficient condition for T to be a diagonal mapping is 
that there exist distinct elemenis in K, say pi, . . « » Drs 8uch that 
eu (T — p(T - pl) ++ - (T-pd) = 90 
Proof. If Tis diagonal, then by definition (Sec. 2, Type III) there is a 
base (ey, .. . , x] of U such that T(e:) = p.esfori = 1, .. . ,n, where 
each p, isin K. Let p, ... , p, be the distinet p’s. Then the operator 
(T — pil) « - - (T — prl) clearly maps each e; into zero and consequently 
maps all of U into zero. Hence (6.11) holds. 
Conversely, let (6.11) hold for T. If a factor T — pil has an inverse on 
U, then it ean simply be omitted from (6.11), and we suppose that is done. 
Then the spaces 


6.12 U, = Ker (T — pil) @=1,...,7) 


all have positive dimension. Now reasoning just as in the proof of Theo- 
rem 5.1, with the U; in place of the characteristic subspaces and with 


f® =(f-—p) +++ (—p, in place of the characteristic polynomial 
e(f), we conclude that U =U, ©--- OU, On U, the mapping 
1 — pilis zero. That is, T restricted to U; is the same as pd. If Bris any 
base for Uj, and if B = {Bi . . . , By} is the joint base in U, then it fol- 


lows at once that mats T is diagonal. @.5.D. 


corottary Lei T be a diagonal mapping of a finite-dimensional vector space U to 
itself, and let W be a T-stable subspace. Then the restriction Tw of T to W is also a 
diagonal mapping. 
Proof. By Theorem 6.2, T satisfies an equation of type (6.11). But 
then Tw plainly satisfies the same equation, and therefore Ty is diagonal, 
by Theorem 6.2 again. Q.E.D. 


404 


The Jordan normal form Ch. 18, Sec, 6 


tHeorem 6.3 Let TU — U be a linear mapping of a finite-dimensional vector space 
over a field K, and suppose that the characteristic polynomial of T has all its roats in K. 
Then there are mappings S and N of U to iteelf such that 


6.43 


S is a diagonal mapping 
N is nilpotent 

SN = NS 

T=S4N 


Moreover, S and N are uniquely determined by these conditions. 


ee 


Proof. The existence of S and N is rather obvious from the Jordan 
normal form. For if the matrix of T is in Jordan normal form, then it has 
the eigenvalues of T along its diagonal and a certain number of 1’s just 
above the diagonal, all other entries being zero. Therefore the matrix of T 
can be written as the sum of a diagonal matrix and another matrix having 
as its only nonzero entries a certain number of 1’s just above the diagonal. 
It is not hard to verify that the two matrices commute and define endo- 
morphisms of U satisfying (6.13). 

To write this out more systematically, let V, be the characteristic sub- 
space of U corresponding to the eigenvalue p; (notation as earlier in this 
section). Thus 


Vi = Ker (T — pay" G 


ire) 
Let T; and I, denote the restrictions of T and I to V;, and put 
N,=T, - 


1 S. = pili 


The operator N; is nilpotent, by (6.14), and S, is a diagonal mapping, 
clearly. We have 


T, =S8: +N, 


and it is clear that S;° N,; = Nie S;. Set 


S=856.--38 N=N,G--- @N, 


(see Proposition 3.4). A direct sum of nilpotent mappings is again nil- 
potent, as we have had occasion to observe at the beginning of Sec. 4. 
Hence N is nilpotent. S$ is clearly a diagonal mapping. For let B, be any 
base in V,;. The matrix of S; for that base is just p; X identity. Hence 
the matrix of S relative to the joint base B = {Bi . , Bj in Visa 
diagonal matrix. Furthermore S*N =NeS, and it quickly follows 
that S and N as just defined satisfy (6.13). 

To show uniqueness, let 8’, N’ be another pair of mappings satisfying 
(6.18). We observe that we have then 8’ oT = 8’o(S’ +N’) = 


The Jordan normal form 405 


S’o S’ + S’ oN’ = So S’ + N’o S’ = (S' 4 N’) 0S’ = To 8’; S’ com- 
mutes with T. 

Let x be an element of the characteristic subspace V). Then S’(x) is 
also in Vi. Indeed, 


(T — plysS(x) = S(T — pal (x) = 0 


since 8’ commutes with T, hence also with any polynomial in T. There- 
fore the subspaces V; are all S’-stable. 

Denote by Sj the restriction of S’ to V,. By assumption, S’ is a diagonal 
mapping, and consequently so is Si, by the corollary to Theorem 6.2. 
Then there is a base Bj in V, such that the matrix of Sj relative to it is 
diagonal. Referring to (6.15), the matrix of S; relative to BY is also 
diagonal, namely, p; X identity. It follows at once that S;° Sj = Sie S; 
and that S; — Sj is a diagonal mapping. S — S’ is therefore diagonal, 
because its matrix relative to the joint base B = {Bi, .. . , Bi} is di- 
agonal. Furthermore, So S’ = 8’ S, 

We have now T = S + N = S’'4N’. Hence 


NoN! = (T —S)o(T - 8") 
N’oN = (T — S‘)o(T —8) 
Since S, T, 8’ all commute, it follows that No N’ = N’oN. 

Now N and WN’ are nilpotent, and so N" = N’* = 0 for some integer m. 
From the binomial theorem (valid here because N’ and N commute; see 
Theorem 5.2, Chap. 3), we get 


om 

am 

YN = _1EN* 0 Nieto 

(N’ — N) (2) 1)* N* o Ni 0 
k=O 

for in each term at least one of the two exponents k, 2m — k must be > m. 

Consider now the equation 


S-—S'=N’-N 


As we have just seen, the left side is a diagonal mapping, and the right side 
is nilpotent. But, as was remarked in Sec. 2, a diagonal mapping cannot 
be nilpotent unless it is zero. Therefore from the last equation we conclude 
that S — S$’ = 0, HenceeS =S',N=N’. QED. 


EXERCISES 

The following problems refer to vector spaces over the complex field. In each 

of them is given the matrix of a linear mapping T relative to some base A in the 

space. In each case find a new base relative to which the matrix of T is in Jordan 
normal form, and write down that form. 


406 The Jordan normal form Ch. 13, See. 7 


(for various a, 5) 


-1 6 1 6 -3 1 10 12 
2 4 -6 -1 -9 8. 0 -1 @ 0 
9 0 -1 0 38 -1 -4 -6 
—2 8 1 10, —3 1 6 8. 
-1 9 6 6 —2 1000 
9. 0 -1 9 0 -2 00 06 
. 3 0 -4 — 10. 0 0000 
—3 0 6 Q 0021 
9 000 2 
1 1 2 1 2 
0 2 a -1 9 
11.f 0 0 -1 4-2 
9 0 0 3 1 
0 Q Q 0 4 


12. Find a4 X 4 matrix which is not triangular and which has integral eigen- 
values. Reduce it to Jordan normal form. 


7. Uniqueness of the Jordan normal form 


To what extent is the Jordan normal form of the matrix of a linear mapping unique? 
The diagonal blocks in the Jordan normal form correspond to the T-stable sub- 
spaces W,,; of (6.5). By rearranging the order of the latter, we effect permutations 
of the blocks (6.8) that make up the Jordan normal form. But apart from such 
permutations, the Jordan normal form is unique, as we now proceed to prove. We 
continue with the notation of Sec. 6. 
Ignoring permutations of the blocks of type (6.8) that constitute a matrix in 

Jordan normal form, the only data necessary to determine it are the following: 

(1) The distinct diagonal entries pi, . . . , p, occurring in the blocks of type 

(6.8) 
(2) The number &; of blocks of the type (6.8) which belong to p, 
(3) The size of each of the blocks of type (6.8) 


Uniqueness of the Jordan normal form 407 


Qur task is to prove that, for any two matrices in Jordan normal form represent- 
ing the same linear mapping T, the data listed above are the same. It will follow 
then that the matrices are the same, apart from permutations of the order in which 
the blocks occur. 

The strategy here is quite simple. We have only to show that the data above are 
completely determined by T. 

1. Let B’ be a base for U with respect to which the matrix of T is in Jordan nor- 
mal form. Let a be that matrix, and let p,, . . . , p, be its distinct diagonal ele- 
ments, p; occurring ”; times, say. Since a is a triangular matrix, we have then 


det (1 — = (6 — pd" +++ (6 — ph 


and this must be the characteristic polynomial ¢(f) of T, hence depends only on T. 
Therefore p:, . . . , Pr must be the distinct eigenvalues of T. 

2, The matrix a being as above, let ac) be the m: X m: block having p; along the 
diagonal. Then a;,, corresponds to a subset BY of the base B’, and Bi contains n; 
vectors, Let Vi be the subspace of U spanned by Bi, Then V% is T-stable, 
clearly, and the characteristic polynomial of the restriction of T to Viis (@ — pd”. 
Hence (T — pd)": maps Vi into zero. Therefore V/ is contained in the char- 
acteristic subspace of V; belonging to p;. It has the same dimension n:, and so 
Vi= Vi 

Since the matrix a was supposed to be in Jordan normal form, the block a;;, has a 
certain number k; of smaller blocks a(;;, arranged along its diagonal, each of the 
form (6.8). And each such block corresponds to a certain subset Bi; of the base Bi 
of ViC=Vi). Let W%; be the subspace of V, generated by Bi; Then Wi, is T-stable, 
and (T — pl) must be nileyelie on Wi, because of the form of agin, which is the 
matrix of T restricted to Wt, We claim now that the number &; in question here is 


mw k; = dim Ker (T — p,1) 


To see this, we note first that Ker (T — pl) is a subspace of V; = Ker 
(T — p,l)", a3 is obvious. From just above we have 


72 V. 


Now T — pil is nileyclic on each Wi, here, and a nileyclic mapping necessarily has a 
kernel of dimension 1, It follows easily that each term of (7.2) produces a con- 
tribution of 1 to dim Ker (T — pi), from which (7.1) follows. (See Exercise 6, 
See. 4.) 

From (7.1) it is clear that k; depends only on T (and 7) and consequently must be 
the same for all Jordan matrices representing T. 

8. Finally we must occupy ourselves with the sizes of the various blocks a, that 
form ac. In other words, we must deal with the dimensions of the Wt; above. 
These dimensions are the orders of the various nileyclic parts into which T — pil is 
decomposed in V;. Then let 


Wh ®o.- BWhy 


408 The Jordan normal form Ch. 12, See. 8 
23 Mit, Mit, =. - » Min; 


be the dimensions of Wh, . . . , Wi,, respectively. We want to show that the set 
of integers (7.8) depends only on T (and #). Observe that we do not claim that the 
subspaces W{; are the same for all Jordan matrices representing T. 

Now if L is any nileyclie mapping, of order s, say, then 


1 forl<k<s 
0 fork>s 


14 dim Ker L* — dim Ker Lit = { 
as follows at once from Sec. 2. From this we shall deduce that 
18 The number of integers ni; in (7.8) such that n,; > h is equal to 

dim Ker (T ~ pil)* — dim Ker (T — pl for h = 1, 2, 8, ete. 


In fact, Ker (T — p,[)* is contained in the characteristic subspace V, fork = 0, 
1, 2, etc. Denote by L;; the restriction of T — pil to Wi;. Then L;; is nileyelic of 
order 7;; [becatise a;,;, is of the type (6.8)]. It is easily seen that Ker (T — p,I)* is 
the direct sum 


Ker La’ 4 


- ® Ker Lat 


because of (7.2). Thus 


ki 
dim Ker (T — p,I)' = > dim Ker L,,* 


ona 


Then (7.5) follows easily from (7.4) applied to the L,;. Formula (7.5) shows that 
the set of numbers (7.3) depends only on T and é and not upon the particular choice 
of Wi, (See Exercise 7, See. 4.) 

This completes the proof of the uniqueness of the Jordan normal form. 


8. The problem of similarity 


Let U be a vector space over a field K. Two linear mappings T1:U — U and 
T::U + U are called similar (notation: T; ~~ T,) if there is an isomorphism 
$:U — U such that 


8a ToS =SoT, 
or equivalently if 


a2 T, =S7 oT eS 


If we represent our mappings by arrows, then the situation at hand can be de- 
scribed by the following diagram: 


The problem of similarity 409 


yp—b_y 
1 1 
| | 
s| |s 
| | 
Ue 


Condition (8.1) requires that the result of following the upper circuit (T, ° 8} be 
the same as that of following the lower circuit (S ° T,). Note that S here is re- 
quired to be one-to-one. 


Suppose that dim U = n (finite), and let B, = {m,..., un} and B, = 
iv, .-., ¥,} be two bases for U. Let S be the isomorphism that maps v; to uj 
fori =1,...,n. Write mats, T; for the matrix of T; with respect to B: (i, 


Jj = 1,2}, ete. From Theorem 7.2, Chap. 9, there follows 


a3 mate, T, = (mate, $)-'(mate, T)(mate, 8) 
= mata, (S'T:8) 
If (8.2) holds—and only then—the right-hand side here is maty,Ts. 
Therefore T, ~ T; if and only if there exist bases in U for which T, and T, hare 


identical matrices. 
Let B be a base for U and put 


a= mats T, b = mate T: ¢= mats S 
Then (8.2) holds if and only if 
a4 b = clac 

Accordingly, we define two (square) matrices a, b to be similar if and only if there 
is a nonsingular matrix ¢ for which (8.4) holds. It is easily seen from Theorem 7.2, 
Chap. 9, that (8.4) holds if and only if a and b both represent the same linear 
mapping with respect to (possibly) different bases, ¢ then being the matrix of the 
base change. 

We pose the following problem: Give reasonable conditions, necessary and suffi- 
ctent, for two linear mappings (or two matrices) to be similar. 

One answer to this problem can be based on the uniqueness of the Jordan normal 
form (See. 7). 

First of all, suppose that T, ~ T., and let B, be a base in U for which the matrix 
of T, isin Jordan normal form. Let $ be an isomorphism for which (8.2) holds, and 
let By be the base in U/ obtained by applying S~ to the elements of By). By (8.3) 
we have 


Mata, T; = mats, T: 


and so T: has the same Jordan normal form as Ty. Conversely, if Ti and T; have 
the same Jordan normal form with respect to certain bases Bi, Bs, and if S denotes 


410 The Jordan normal form Ch. 13, Sec. 8 


the isomorphism that transforms B, into B,, then (8.2) holds, and so T; ~ T;. 

Referring now to the data of (1), (2), (3} in Sec. 7, and to the subsequent analysis 
of them, we derive the following alternative formulation of the criterion for 
T, ~ T: just given. That is, T; ~ T, if and only if the conditions below are satis- 
fied (assuming that the eigenvalues of T, and T, are all in the field of scalars K): 


28 (1) T, and Ty have identical characteristic polynomials. 
(2) dim Ker (T, — pil)’ = dim Ker (Ts — p,l)* for each eigenvalue p; and 
for h =1,..., ne where n; is the power of (E — pi) in the charae- 
teristic polynomial. 


Condition (2) just enunciated is the necessary and sufficient condition for the num- 
bers k; and ;; to be the same (apart from permutations) for T, and T:, according 
to (7.1) and (7.5), Thus (1) and (2) hold if and only if mats, T; = matg, T, for 
suitable bases which yield the Jordan normal form for T, and T:. 


The criterion given by (1) and (2) above apply equally well to matrices, for any 
n Xn matrix with coefficients in K ean be regarded as a linear mapping of the 
vector space K” of column vectors. 

i 
0 
Tr being regarded as linear mappings of the vector space C? of complex column 
vectors, We shall show that T, ~ T:. 

The characteristic polynomials are 


examece a Take T; = ( t ‘) and T, = ( SY wien # = VTi, 1; and 
- -i 


det WIT) = aot (1 %) =e41 


t-i 0 
det (1 -— T:) = det sf+1 
let ( 2} (Fe) + 


The characteristic polynomials are identical and the eigenvalues are i, The 
equality (2) of (8.5) has to be checked only for k = 1, and it is easily seen that 
dim Ker (T; — pl) = 1 for p = i andj = 1,2. 

Therefore T, ~ Ty. It is easily verified that T: = S-'T,S, where 


s-( _) 


Exampte 2 Let T, be a linear mapping of an n-dimensional vector space. U over 


a field K, and suppose that T, has x distinct eigenvalues p:, . . . , Px. Choose a 
base A = {u,] in U, and let T; be the linear mapping defined by T:(u:) = pati 
(=1,...,), Then T; and T; are similar, According to the discussion above, 


this merely means that there isa base B = {v;} in U such that mats T; = mat, Te, 
and the latter is a diagonal matrix. That is, T, ~ T, here simply means that T, 
is a diagonal mapping. This has already been shown in Sec. 1. 


Elementary divisors 4 


To see how the criterion (8.5) applies here, it is clear that T, and T: have the 
same characteristic polynomial, namely, (4 — mi) - - - (f — Bal. 

For the second part (2) of (8.5), we know that in general, if V; = Ker (T — p,D)", 
ni being the power of (¢ — p;) in the characteristic polynomial, then dim V,; = ni 


andm +--+ +m, =m. Inthecaseat hand, m =1,...,%, =1. Itisonly 

necessary to check (2) for & = 1 here, and, as just observed, dim (T; — pil) = 1 

for} =1,2andi=1,...,n. Hence, condition (2) is satisfied here. 
EXERCISES 


Test the following matrices for similarity (for both genera] and special values 
of a, b, ¢): 


0 2 ot 20 1 oo 
L d a 
( 0) a ( 0) ; ( 0) and ( >) 


00 01 a a 0 0 1 0 

3, ( 1) and ( i) 4.[0 0 6 and 0901 
o 0 9, 5 0 0 

Oa a 1 0 3.5 2 3 a 0 

5 ( 0 and (: ol 6 (: 10 and (: 3 0 
o 0 O90 4, 0 0 38, 001 


9. Elementary divisors 


Criterion (8.5) for similarity of linear mappings (or matrices) has two drawbacks: 

(1) It is usually impossible to solve a polynomial equation of degree greater 
than 4 exactly, and therefore it is usually impossible to determine the eigenvalues 
of an endomorphism exactly. 

(2) The criterion applies only in the important but rather special case in which 
the mappings have all their eigenvalues in the field of scalars K. That is no restric- 
tion if the field is the complex field; but in general the characteristic polynomial 
will not have all its roots in K. 

We shall therefore give another criterion for similarity which is not subject to 
the foregoing objections. 

We commence with some preparatory observations. Here K denotes a field. 
We shall consider the polynomial domain K[t] of polynomials in an indeterminate ¢ 
with coefficients in K. 

Let fi, .. . , f, be polynomials in K{[é}, and let us denote by (fi, .. . . f;) the 
set consisting of all polynomials f which can be expressed in the form 


PRY feafit--- taf, 


412 The Jordan normal form Ch. 18, Sec. 9 


the a; being also in K(f]. If f’ = fi + + - + +f, is another polynomial of the 
same type, then clearly f + f’ is also of the form (9.1). Let g be any polynomial 
in K[é}, and let f be as in (9.1). Then 


of = Ganfi +--+ + (arf 


which is again of the type (9.1). Therefore the system (fi, . . . , f-) contains the 
gum and difference of any two of its elements, and it contains the product of any 
of its elements by an arbitrary polynomial in K[t]. In particular, (fi, .. . , fr) 
contains the product of any two of its elements. It is plain that (f, ..., f,) 
contains fi, .. . .f. 

We call this system of polynomials the ideal generated by the f;. As a special 
case, the ideal (/) generated by a single polynomial f simply consists of all products 
a- f, where a is an arbitrary polynomial in K[(é}. 


Leth = gcd. {fi, ... ,f-} denote the greatest common divisor of fi, .. . ,f. 
From Sec. 8, Chap. 6, we know that & can be expressed in the form (9.1). There- 
fore h is in the ideal (fi, ..-, f-). On the other hand, & divides each fi, say 


fi =o: h. Therefore each f; is in the ideal generated by A. It follows that the 
two ideals are identical: 


22 Gee fl =) ith =geed ff, sft 


From this it follows that 


93 Gu ses fd = Ou... 49s) if and only if 
ged ff, ..-.f} =ged fo, ..- 1g 
Let a = (a/,) be a matrix with coefficients in an integral domain. Now form 
the matrix made up of those elements of a which are in columns i, . . . , ¢p and 
rows ji, ..-» dp The determinant of this p x p matrix is denoted by a” z 
Thus 
a al 
94 a = det . oa 
" ‘ tp dp 
a a, 


These quantities are called the p X p minors of a. The 1 x 1 minors are simply 
the entries a’; in a. If a is square, say  X , then the minor a)’ |’ is none 
other than det a. 

We now consider matrices with coefficients in the polynomial domain K([d]. 


theorem 91 Lel f = (f/,) and g = (g/,) be two matrices with coefficients in the 
polynomial ring Klt|. If there is a third matrix a, also with polynomial coefficients, 
such that f = ag or else f = ga, then the p X p minors of f are all in the ideal gen- 
erated by the p X p minors of g. 


Elementary divisors 418 


Proof. Suppose that f = ga. For simplicity of notation consider the 
minor f}"' +P of f. We have by (9.4) and the formula (2.7), Chap. 11, 


foo 2 = Shes) _ fF? 


. 


where s denotes an arbitrary permutation of 1, .. . , p and where e(s) 
denotes its sign. Since f', = g’a*; (Einstein surnmation convention on 
k here), we obtain 


Sew (ite) (re) 
Ves) 


he 


we 


f) 
= 
8, 


This is an expression of the type (9.1) forf'’' ? in terms of theg,! "7, 


showing that f} |‘ | is in the ideal generated by the p x p minors of the 
matrix (9/). The same argument holds for any minor f'"" #, and, with 
trivial modification, for the case f = ag. Q.E.D. 


THEOREM 3.2 Lei a, b, f, g be matrices with coefficients in K[t]. Suppose that a 
and b are square and that det a and det b are nonzero constants (that is, nonzero 
elements of K). If 


Ex f = agb 
then the p X p minors of f have the same g.c.d. as the p X p minors of g.f 


Proof. First of all, the inverses a and b™ have all their coefficients 
in K[é}. This follows from the fact that the coefficients of a—, say, are 
—except for sign—the (n — 1) X (nm — 1) minors of a, divided by det a. 
We have assumed that det a is a nonzero constant. 

Now Theorem 9.1 says that the p X p minors of f = (ag)b are in the 
ideal generated by the p X p minors of ag; and these p X p minors are 
in turn in the ideal generated by g. Hence, the p X p minors of f are in 
the ideal generated by the p X p minors of g. From (9.5) we have 


g=atht 
and from the remarks above we conclude similarly that the p X p minors 
of g are in the ideal generated by those of f. The assertion then follows 


from (9.3). QED. 


+ We understand this to mean that if all the p x p minors of g are zero (the g.c.d. is not 
defined) then so are all the p X p minors of f, and vice versa. 


Ath The Jordan normal form Ch. 18, See. 9 


26 For any n Xn matrix a with coefficients in Kl), we denote by d,lal the 
monie g.c.d. of all the p X p minors of a, provided they are not all zero. For 
p = 0 we set dela] = 1. 


THEOREM 93 For any matrix with coefficients in K(M], d,ilal divides djla] for p = 
1,2,..., 7, where r is the largest integer for which the r X r minors are not all 
zero.t 
Proof. Consider a nonzero p X p minor of a (p > 1). Expand the 
determinant that defines it by some row or column. The result is a sum 
of (p — 1) X (p — 1) minors multiplied by coefficients of a (and by +1). 
Since d,_:[a] divides all the (p — 1) x (p — 1} minors in the sum, it 
must divide the given p x p minor. The assertion follows at once from 
the definition of d,fa]. Q.E.D. 


We can indicate the assertion of the theorem by writing 


2.72 dala) Jayla} |dota} | + + - 


dla} 


DEFINITION 9.1 Lei a be a matrix with coefficients in K[l], and let r be its rank. 
Let djla] (p = 0,1, . . . , 1) be the polynomials defined in (9.6). We denote the 
quotient of dpla] by dy_sla] by qplal. That ts, 

9.8 d,fa] = qola] - d,—ifa] (P=1..-,7) 


The polynomial q,{a] is called the pih torsion order of a. We define gala] to be zero 
forp > +r. If ahas coefficients in the field K, then the pth torsion order of the matrix 
#1 — ais called the pth elementary divisor of a. 


EXAMPLE 1 For a diagonal matrix the calculations are not difficult: 


1 i} 0 0 
0 t 0 0 
Q 0 -1 0 
0 0 0 P+1 


The nonzero 1 x 1 minors are 1, —1, f, ®@ +1. Their ged. is 1. The nonzero 
2X2 minors are (apart from sign) 1,4 (2 +0, (@ +41). Their g.cd. is 1. 
The nonzero 3 X 3 minors (omitting signs) are f, @ +1, ¢+ (2 +1). Their 
ged. is 1. The nonzero 4 x 4 minor is {((? + 1). Hence the torsion orders are 
LLLee +n. 


EXAMPLE 2 
i 0 o oo 
o 4 0 90 
a 0 &&-1) 0 
0 0 % 0 


ris simply the rank of the matrix a. See Corollary 2 of Theorem 3.1, Chap. 11. 


Elementary divisors 45 


Here d, = 1, clearly. The nonzero 2 x 2 minors (omitting signs) are i, é(¢ — 1), 
P(t — 1). Hence d; =f. The nonzero 3 x 3 minor (omitting signs) is @(¢ — 1) = 
d;. The torsion orders are 1, f, ( — 1). 


We now look into the problem of computing torsion orders. The procedure we 
are about to develop is based upon the method explained in Sec. 9, Chap. 9. 
Theorem 7.3, Chap. 9, together with Eq. (7.3), states that if a is any matrix with 
coefficients in a field L, then there exist nonsingular matrices b, c with coefficients 
in Z such that 


bac is a diagonal matrix 


The matrices b, ¢ here correspond to certain base changes in the vector spaces in 
question, In Sec. 9, Chap. 9, it was shown how to make the required base changes 
by a series of elementary row and column operations on the given matrix a. It is 
easily seen from Sec. 9, Chap. 9, that the matrices b, ¢ can be determined in such a 
way that 


detb = 41 dete = 41 


The foregoing applies in particular to matrices with coefficients in the field K(#) 
of rational functions in ¢. However we are interested in matrices with coefficients 
in the smaller system K[t] of polynomials in t. Let us see how much of the method 
of Sec. 9, Chap. 9, survives if we confine our elements to K[#], so that unrestricted 
division is prohibited. 

The basic operation of Sec. 9, Chap. 9, is the rather trivial one of subtracting suit- 
able multiples of a given row in a matrix from the other rows in order to kill off all 
the elements in a specified column—save that element in the given row, of course. 
(Similar operation for columns.) The only modification of the process necessi- 
tated here is the following: 

Let a* = (a4, . - . , an) and a’ = (ah, . . . , a’,) be two rows of our matrix, 
and suppose we want to subtract some multiple of a/ from a* in order to get 0 in the 
kth position of row k. If we confine ourselves to polynomials, that is not possible 
unless a?, divides a*,, clearly. Therefore we do the next best thing: we subtract 
from a‘ the multiple of a’ that leaves a polynomial of lowest possible degree in the 
kth place of row h. More specifically, applying the division algorithm to the poly- 
nomials a%, and a*,, we can write 


29 aya qeak tr 


where r = 0 or else deg r < dega’;. Then, subtracting qa’ from at, we obtain the 
remainder r in the kth place of the new Ath row. A similar modification applies to 
elementary column operations, naturally. 

‘The central point here is the following: Let a; and a’, be two nonzero elements 
of a matrix a with coefficients in K(¢]. If neither of these two elements divides the 


416 The Jordan normal form Ch. 13, See. 9 


other, then a can be transformed by elementary row and column operations (i.e., 
using only polynomials) into a new matrix a’ which contains a nonzero element of 
lower degree than either a’; or a’,. The argument is very simple. If a’; and e”, are 
in the same column (i = k), and if deg ai, < deg a’, say, then we subtract ga/ from 
a‘, where is as in (9.9). By our assumption the remainder r in (9.9) cannot be 
zero and it has lower degree than a, and a; r appears, as pointed out above, in 
the (h, &) position in the new matrix. A similar procedure is valid if a’, a’, are in 
the same row (j = h). If i #k andj # kh, the same conclusion can be arrived at 
easily by comparing both a‘, and a’, with a’. 

Thus, if a/; is the element of least degree in a, and if it does not divide every ele- 
ment of a, then elementary row and column operations of the kind just considered. 
lead to a new matrix a’ containing an element of lower degree. Starting afresh 
with a’, if an element of least degree does not divide every element of a’, then in the 
same way we obtain a new matrix a” containing an element of yet lower degree. 
Since degrees of polynomials cannot be negative, the process must terminate after 
a finite number of steps, say with a matrixb. By permuting rows and columns in b 
we can assume that 0’, is a nonzero element of least degree. By what was just said, 
b!, must divide every element of b. Therefore, by subtracting suitable polynomial 
multiples of the first row (or first column) of b from the other rows (or columns), we 
get a matrix b’ having zeros everywhere in the first row and first column, except 
for b',, Hence b‘ will have the form 


We can now repeat the same argument with the smaller matrix ¢, without dis- 
turbing the first row or column of the larger matrix. The following theorem is then 
evident: 


THEOREM 3.4 If ais any matrix with coefficients in the polynomial ring K[#), then it is 
possible to transform a into a diagonal matrix a‘ by a series of elementary row and 
column operations, using only polynomials. This can be done in such a way thal if 
q(t), . . . , a(t) are the nonzero diagonal terms in a’, then q'gel » + + lar. 

For the last assertion, recall that 5!, in (9.10) divides every element in the matrix. 
If we transform the submatrix ¢ into the same form (9.10) and denote the corner 
element by ¢%, then 6', must divide c%, ete. 


exampce 3 For the matrix 


+ Including permutations of rows or of columns. 


Elementary divisors al? 


a 0 0 
(: oOo t+i1 
9 e 0 


we first add row 1 to row 2 and then subtract column 1 from column 3 in the result, 
obtaining 


t 0 -t 
t 0 1 
Oo f 4Q, 


Here we have an element of degree zero. Interchange column 1, column 3 and 
row 1, row 2: 


10¢ 
—t O08 
oF 90, 


Add t X row 1 to row 2: 


1 9 t 
0 0 44+ 
0 e Q 


Subtract ¢ x column 1 from column 3: 


1 0 96 
9 Ot+e 
oO e 0 


This has the form (9.10). We continue with the 2 x 2 matrix 


O@1+ef 
# 0 


Add row 2 to rew 1 and then, in the result, subtract column 1 from column 2: 


eft 
e -# 


Subtract x column 2 from column 1 and then interchange column 1 and column 2: 


t 0 
Oo P+8 


Thus the diagonal form of the 3 x 3 matrix is 


1 0 0 
( t 9 
0 oO PLEA, 


4s The Jordan normal form Ch. 13, See. 9 


Clearly 1!¢|(@ + #) as promised in the theorem. 
Let 2 denote an elementary row operation on matrices with coeflicients in K[é]. 
It is easily verified that 


R(ab) = R(a)-b 


That is, the same result is obtained by either applying FR to the product ab or by 
applying R to a and then multiplying by b. Similarly, if C denotes an elementary 
column operation, then 


€{ab) = a - C(b) 


In particular, taking either a or b to be the identity matrix, we obtain the following 
rules: 


R(a) = RU)-a 
ot Ca) = a- Ca) 
Now let Ri, ..., By and C;, ..., Cy denote the series of row and column 


operations required to reduce a matrix a to diagonal form a’ according to Theorem 
9.4. Set 


b; = RD 
and 
e = 6) 
By repeated applications of (9.11) we get 
gaz a = by bp es bp a yee Cy 


Consider one of the matrices b,: It is obtained from the unit matrix by an ele- 
mentary row operation, which does not alter the determinant, or by a permutation, 
which at most changes the sign. Therefore, det b; = +1. Similarly, dete; = +1. 
Write 


and 
Cat ty 
We have then 
9.43 a’ = hac det b = +1 dete = 41 


From Theorem 9.2 it follows that a and a‘ have the same torsion orders. But for 
the diagonal matrix a’ it is very easily seen that the nonzero torsion orders are 
(apart from constant factors) just the nonzero diagonal elements a(f), . . . . ¢-(0) 
This follows at once from the definitions and the fact that qilq:| - + + |g. We have 
thus proved the following theorem: 


Elementary divisors 419 


THEOREM 98 Leta bea matrix of rank r with coefficients in the polynomial ring Kl]. 
Then the torsion orders m, . . . , q of a satisfy qlqe| + - + |G. Moreover, the torsion 
orders are, except for constant factors, the same as the nonzero diagonal elements ob- 
tained when a is put into diagonal form by elementary row and column operations 
according to Theorem 9.4. 

(The nonzero torsion orders are monic polynomials, as follows from the defini- 
tion.) 

Recall that the elementary divisors of a square matrix a with coefficients in the 
field K are by definition the torsion orders of the matrix? 1 — a. Hence, the follow- 
ing is true: 


corottary The nonzero elementary divisors of a square matrix with coefficients in K 
satisfy qu(t)Iqe(t)| - - - lan(t). Moreover they can be computed by a definite procedure 
involving only a finite number of additions and multiplications. 


ExampLe 4 Compute the elementary divisors of 


3 Q 1 
a=[2 -1 0 
0 2 -1 


We have 
t—38 0 -1 -1 0 t-—3 
tl-a=|-2 t+] 0 oad ti) t+1 -2 
Q -2 e+ t+1 -2 0 


permuting columns 1 and 3. Add (¢ + 1) X row 1 to row 8; and then add (¢ — 3) x 
column 1 to column 3, The result: 


-1 9 0 
0 t+. -2 
0 —2 @ — 2t - 3 


Exchange rows 2 and 3: 


-1 a) 9 
0 -2 ¢-2-3 


9 t+1 —2 
Add 14+ 1) X row 2 to row 8: 
-1 0 0 

0 -2 e-2%-8 


0 0 Kee -5t-7 
Add 1(P — 24 — 8) x column 2 to column 3: 


-1 0 0 
0-2 a 
0 0 we-e-8-7) 


420 The Jordan normal form Ch. 18, Sec. 10 


The elementary divisors of a (that is, the torsion orders of {1 — a) are 1,1, 8 — 
f — 5f — 7, since they are monic polynomials. The last one is the characteristic 
polynomial of a. 

REMARK. We had occasion to remark in Chap. 6 the very close parallel between 
the theory of the g.c.d. for integers and the analogous theory for polynomials. 
It is easy to see that all the foregoing material, with the obvious modifications of 
terminology, holds equally well for matrices with integral coefficients. That is, 
one simply replaces the integral domain K{i| by Z. 


EXERCISES 


Find the torsion orders of the following matrices: 


(( 0 
0 t+1 


370 ¢+1 Q 4, (, ') 
) 0 42, 
¢ 190 ¢ 1 0 
5& (0 ¢ 0 6.10 t U 
Oo 0 ¢ 0 Oo t+1 
7. The diagonal matrix with diagonal terms ¢ — fj, { — f, ...,!— 8, with 
Bi, ... , 8, all distinct. 
8. The matrix (/7 — a) where a is the matrix given in Exercise 6, Sec. 5. 
& 1 9 «0 
0 & 0 9 . 
Xa = ith 2 
a 00 & 1 with 6: x Bs 
0 0 0 & 


10. a ag in Exercise 9 with 


10, Elementary divisors and similarity 


It follows at once from Theorem 9.2 that the elementary divisors of similar matrices 
are identical [ef. Eq. (8.4)]. Therefore one can define the elementary dirisors of a 
Unear mapping T of a finite-dimensional vector space to be the elementary divisors 
of the matrix of T with respect to any base for T. In fact, any two such matrices 
are similar. 

In (8.5) we have given a criterion for two linear mappings to be similar—in the 
specia) case where all the eigenvalues are in the field of scalars. We shall now give 
another criterion. 


Elementary divisors and similarity 421 


qneorem toa Let U be a vector space of finite dimension over a field K, Let T, and 
Te be two linear mappings of U to U. A necessary and sufficient condition for Ty and 
Tr to be similar is that they have the same elementary divisors. 

We shall give two proofs for Theorem 10.1. Our first proof is for the illuminating 
special case that all the eigenvalues of the mappings are in the field K. Our proof 
for the general case is given in See. 11. 

Proof. First of all, if T, and T, are similar, then their matrices relative 
to any base in U are similar. Since similar matrices have the same ele- 
mentary divisors, it follows that T, and T; have the same elementary 
divisors. 

To prove the converse, we assume that the eigenvalues of T, and T, are 
allin K, and we assume that T, and T, have the same elementary divisors. 
‘We have only to prove that T, and T» have the same Jordan normal form, 
and then it follows from Sec. 8 that T; ~ To. 

Consider first the computation of the elementary divisora of a linear 
mapping T such that T — plis nileyclic of order n. Let B be a cyclic base 
with respect to T — pl, and let a denote the matrix of T relative to that 
base. To compute the elementary divisors we must consider the matrix 
tI — a. Below are listed a series of elementary column operations ‘on 


tl-—a: 
tp a -1 0 +. 0 
0 (-pPi-p 0 
0 J 0 oO U 
i] 

i) -1 0 ne 9 

0 0 -1 Lee 0 
=[@-pe @-pe @-p oe) 0 Jo 


@— py — pyr? ie on bed 
0 0 tee tap, 


422 


The Jordan normal form Ch. 18, See. 10 


AL 
-1 0 
0 
-1 
(t — py” 
Thus the elementary divisors are 1,1, . . . , (¢ — p)". 


Next consider the Jordan normal form of a linear mapping T whose 
eigenvalues are all in the field K. Let a be its matrix relative to some base. 
Since a is similar to a matrix in Jordan normal form, {1 — a has the same 
torsion orders as a matrix of diagonal blocks, each block of the form (10.1) 
above. Now the torsion orders of a diagonal matrix of this sort, having 
diagonal elements either —1 or else of the type (¢ — p)", are easily com- 
puted, and it is quickly seen that the matrix can be reconstrueted from the 
knowledge of its torsion orders, apart from a permutation of the blocks. 
Therefore from the elementary divisors we can reconstruct a Jordan nor- 
mal form for a linear mapping T. If T, and T, have the same elementary 
divisors, they therefore have identical Jordan normal forms, whence 
TI~T. 


exampce 1 Let the torsion orders be 1, 1, 1, (f — 5)*, (4 — 5). For a matrix of 
the type just discussed, we must have a 2 X 2 block and a 8 X 3 block. The 
following diagonal matrix meets the requirements: 


—1 
-1 
@ - 5) 
-1 
5 


exampce 2 Let the torsion orders be 1, 1, 1, (¢ — 2)? (f+ 1)% Again we need 
a2 xX 2 block and a 2 x 3 block: 


-1 
(- 2) 
-1 


+4 


EXERCISES 


Reconstruct the Jordan normal form from the given elementary divisors. 
Lie 
2114140 


Modules, torsion orders, and the rational canonical form 423 


31111 + DE-DE -2) 

441128400 -1 

5. 1,1,1, 26 +17? 

6-10. Given that T has the elementary divisors above, find the elementary 
divisors of T?. 


11. Modules, torsion orders, and the rational canonical form 


By definition, the elementary divisors of an n X » matrix a are the torsion orders 
of the matrix ¢[ — a, where ¢ is an indeterminate and I is the x x 7 identity 
matrix. In this section we shall interpret the significance of the matrix #1 — a. 
Thereafter, Theorem 10.1 will become transparent. 

Let U be a finite-dimensional vector space over a field K, and let T be an endo- 
morphism of U. We shall define, strange as it may seem, a K[#]-module structure 
on U. (See Sec. 11, Chap. 9.) The definition is as follows: 

For any polynomial p(¢) in K(!J, we define the scalar multiplication by 


ma p+ x = p(T)(x) 


for any xin U. Inasmuch as the mapping p(é) — p(T) is a ring homomorphism of 
K{t] into the ring of endomorphisms of the vector space U, it is readily seen that 
(11.1) defines U as a module over the ring K(é], and we denote it by UT. 

Although the module U, considered as a module over the field K has a base, the 
module UT, as a module over K(f], does not have a base, For if e were a base element 
of UT, it would be true that p()e # 0 for any nonzero polynomial p(¢)—which is 
impossible since y(t)e = o(T)-e = 0 if we take y(!) to be the characteristic 
polynomial of T. Given two endomorphisms T, and T; of the veetor space U, we 
obtain two K{t]-modules UT and UT, respectively. 


The endomorphisms T, and T, are similar if and only if the modules U™ and 
UT: are isomorphic. 


Proof. By definition, T, and T, are similar if and only if there is a vector- 
space isomorphism 8: U — U such that 


12.3 TS = ST; 
By definition of the modules U™ and UT, we have 


ix = Tix forall xin UT 
ix = Tex for all xin U™ 


Hence, if $ is 2 vector-space isomorphism satisfying (11.3) 
S(ix) = STi(x) = TiS(x) = iS(x) 


for all xin UT. It follows in turn that 


424 The Jordan normal form Ch. 18, Sec. 11 
S(Px) = S(itix)) = ISix) = PSG) 
and more generally that 

a4 S(p()x) = p@)S) 


for all x in the module U™ and for all polynomials p(t). Equation (11.4) 
shows that S is not merely an isomorphism of the vector space U (over 
the field K} but is in addition 2 module isomorphism over the ring K[i}, 
of UT onto UT, The converse can be proved by retracing steps. 
Thereby, assertion (11,2) is proved. 


Our strategy for proving Theorem 10.1 can now be revealed. We shall show 
that if T is an endomorphism of the vector space U over the field K, then we can 
construct a module over the ring K[J) isomorphic to the above-defined module UT, 
from merely a knowledge of the elementary divisors of T. An immediate conse- 
quence of such a construction is: if two endomorphisms T, and T, have the same 
elementary divisors, then UT! and UT are isomorphie; hence T, and T, are similar. 

Our construction will come in two steps. Throughout the discussion below, T 
denotes a fixed endomorphism U — U, where U is an n-dimensional vector space 
over the field K; UT denotes the corresponding module over the ring K(t). 


step 1 Construct a free module M over the ring K(i] and a submodule N such 
that 


aus M/N = UT 


where = denotes “‘is isomorphic to as a module over the ring K[f].”” 
The construction is as follows. Let B = {e, ..., €n} bea base of the vector 
space U over the field K. The mapping 


eg: KXe3U 
given by 

ely ee ele) = er toes + eney (e; in K) 
is a vector-space isomorphism. Set 


e* = oe) 


Set 

M = Kit 
that is, M is the K(f[-module of all n-tuples (r1, . . . , rx) where each 7; belongs to 
the ring Kid). 


Let a = (a':) denote the matrix of the endomorphism T with respect to B; 
that is, 


ie Te; 


Yate; @=1,2...,”) 


Modules, torsion orders, and the rational canonical form 425 


‘We may regard each element x in K” as an element in the larger module M = K[|", 
In particular, B* = lef, ... , e%| isa base of the module 7. Let T* denote the 
endomorphism of WZ whose matrix with respect to the base B* is a; that is, 


Seve 


mi 


a7 Ttet 


We may at last specify the desired submodule N. Set 
aus N = image ([* — T*) = G1 - TOM 


where * denotes the identity mapping of 7 onto M. More explicitly, N is the 
submodule of M generated by the n-tuples 


Py Gay ee es a") 


a") 


@~ ah, ~@ 


sees bay) 
in other words, the columns of the matrix 


us’ tl—a 


We now prove that M/N = UT. Let fi 
phism sueh that 


be the K[f|-module homomor- 


11.9 fet) =e, @=1,2,...,”) 


The existence and uniqueness of f is assured by the result in Sec. 11, Chap. 9- 
Moreover, 


11,10 {(T*(e}) = fe T*e oe) = fo go Tie) = Ted 


for(@=1,...,m), since To g(x} = yo T*(x) for all x in the subset K" of the 
module M by (11.7), and F coincides with g on K” by (11.9). Hence 


14.4 foT* = Tef 

On the other hand, since f is a K|‘]-module homomorphism, 
f(x) = (x) = Thx) 

for all xin M, Hence 
follt =Tof =foT* 

and consequently 

uaz fot -T*) =0 


In other words, N= (iI* ~ T*) Af is in the kernel of f. 
It follows from See. 11, Chap. 9, that there is a homomorphism f": M:N — UT 
such that fox = f, where x is the projection of M onto M/N; in a diagram, 


426 The Jordan normal form Ch. 13, Sec. 11 


M- 
R 
aa 


MIN 


We shall prove that f” is an isomorphism by showing that it has an inverse, Let 
g: UT + M/N be the mapping given by UT", Kx = M7, M/N, that is, 

g=negt 
We have fog =f ene yg"! = fog" = the identity mapping of UT. It follows 
at once that g is a K{iJ-module monomorphism. In order to prove that g is an 
epimorphism, we note that 

t1*(x) = T(x) mod N 
for any x in M, where “x = y mod N” means “x — y is in N.” Hence, mod N, 
we get 

ix = T*(x) 

B(x) = t+ f(x) = ET (x) = Tix) = T*(T*(x)) 


inasmuch as T* is a K[é}-module endomorphism and T*(x) = T*(y) mod N if 
x= ymod N. It follows similarly that 


tx = T#(x) mod N 
and 
p(t)x = p(T*){x) mod N 


for any polynomial p(é) in the ring K[]. Given now any element x in M, we may 


write 
x= piel +--+ + piltver 
with p(f), . . . , p,(f) elements of the ring A[f]. Consequently, 
= p(T*)ef + pilT*)ef +--+ + p,(T*)eh mod N 


From (11.7) we see that T* sends K" into K". Hence the right-hand term of the 
above equation is in K* and each element of M can be expreased in the form 

ytu 
y in the subset K“ and 2 in the submodule N; in other words, we can write M = 
K" +N. Now 

RUT) = zoe (UT) = a(K") = a(K" +N) 

= x(M) = M/N 

Consequently g is an epimorphism. It follows at once that g is an isomorphism 
and that f” is an isomorphism too. We sum up our result in the following theorem. 


Modules, torsion orders, and the rational canonical form 427 


sHeonem i111 Let T be an endomorphism of an n-dimensional vector space U over a 
field K, and let UT be the corresponding K(t)-module formed from U upon defining 
pl)x = p(T)x for any pit) in the ring of polynomials in an indeterminate t. Let M 
be the free K{f)-module K(f)", and let T* be the endomorphism of M defined by (11.7). 
Let N denote the image of the endomorphism (* — T*), where I* is the identity 
mapping of M. Then as K(t)-module, M/N is isomorphic to UT. 


ster 2 We now relate the elementary divisors of the endomorphism T to the 
K[t-submodule N. 

By definition, the elementary divisors of the endomorphism T are the torsi 
orders of the matrix #1 — a, where a is the matrix of T with respect to any base. 
By (9.13), we have 


11.13 tl~a=baAc 


where A, b, and care matrices with coefficients in K[t] with det b = +dete = +1, 
and A is a diagonal matrix, 


f(t) 
ae 
a= 
galt) 
with a) | a@@)| + - > lanl. 
Let S*, A*, and R* denote the endomorphisms of 47 whose matrices with respect 
to the base B* = (ef, ..., et} are b, A, and, Inasmuch as det b = +1, the 


inverse matrix b— has its coefficients in the ring K(é], and consequently the endo- 
morphism S* of the module M is an invertible automorphism of M. Similarly, the 
module endomorphism R* is an invertible automorphism. Since the left side of 
Eq. (11.13) is the matrix of ¢1* - T* with respect to B*, we have 


aaa Ut — Tt = S*a*R* 
Hence N = (f1* ~ T*)(M) = S*A*R*(M) = S*a*(M), and so 
auas S*1(N) = A*(M) 


Equation (11.15) implies that the automorphism $*~' of M sends N onto the 
submodule A*(M). Henee, by See. 11, Chap. 9, the M/N part of S*~ induces an 
isomorphism of the quotient module AJ /N onto the quotient module M/A*(M). 
Now the matrix A is obviously determined by the elementary divisors q(!), ... 5 
an(£). Hence the quotient module \f/A4*(M) is determined by the elementary 
divisors. Since 


M/a*(M) ~ M/N = UT 


428 The Jordan normal form Ch. 18, Sec. 11 


we find that, to within an isomorphism, the K[f|-module U is uniquely determined 
by the elementary divisors of T. In view of remark (11.2), Theorem 10.1 is now 
proved. 

However, our proof gives much more information than Theorem 10.1, For the 
module U can be described quite explicitly in terms of the elementary divisors. 
The module M = K{[t]" is the direct sum of n copies of K[t): 


M=K(H}O--- OK(t 


368 aM) = @)) 0 -- © a) 


where (q,(é)) denotes the principal ideal in K{i] generated by q(t), that is, the 
submodule of K[i] with the single base element q.(f). The quotient module has 
the form 


aris” M/N = K(f/(u@)) ® + + - @ KUE/@.() 


It remains to say some words about a K(t]-module of the form K[Z]/(q(é)), where 
we may assume that q(t) isa monie polynomial. We have seen that a vector space 
U over the field K together with an endomorphism of U defines a module over the 
ring K{é). Conversely, a K{i]-module L defines a vector space over the field K to- 
gether with an endomorphism T defined by Tx = fx for all x in L. Considering 
U = K[t)/(q@)) as a vector space over the field K, what is the nature of the 
corresponding endomorphism T? 

Let g() = ao + ait + +++ +f bea monic polynomial of degree r, Then asa 
vector space over K, K[é)/(q(f)) has as a base the r elements x(1), x(!),..- , 
xi), where »: K(t] + K[#/(q(t}) is the projection (ef. Exercise 11, Sec, 11, 
Chap. 9). Moreover the projection x is a ring homomorphism (cf. Exercise 9, 
See. 11, Chap. 9). Consequently we have 


fx(1) = n(Z) 


ta(th) = alte) 
= ti) = xt) 


However n(aa + aif + - + - +£) = 0 implies 
al) = —(am(l) Fae) + Fae) 
In short, relative to the base x(1), . . . , x(#-9), the matrix of the endomorphism 


T of the vector space U is 


We have ¢(T) = 0 since 


Modules, torsion orders, and the rational canonical form 429 
AT)U) = g)n(Kl) = x(QOKlq) = 0 
Moreover, for any polynomial p(f) of degree less than r, p(T) = 0, since 
p(T)x(1) = p@x(1) = x(p(h) #0 
Consequently, 
aiis g(t) is the minimal polynomial of the endomorphism T. 
We can sum up our conclusions in the following way: 


THEOREM 11.2 (The theorem on the rational canonical form) Let T be an endo- 
morphism of an n-dimensional vector space U over a field K, and tet a(t) | q(t) | 
+++ | q(t) be the elementary divisors of T that are distinct from 1. Then there is a 
base for U with respect to which the matrix of T has the block form 


0 0 0 

0 ay 0 
FrRr 

0 0 se ae, 


where the first 0 is an (n ~ r)z(n — 1)0 matrix and where the ith block is determined 
from the i-th elementary divisor q;(t) by the formula (11.17). Moreover, the minimal 
polynomial of T is a(t). 

The block form (11.19) comes from the direct sum (11.16'), and that is why we 
can ignore the n ~ r elementary divisors which are equal to 1—for they yield a 
summand K[t|(1) which is the zero subspace. 

Note 1. There can be no zero elementary divisors since the product 
of the elementary divisors is the characteristic polynomial y(t) = det 
(H ~ T). Also, o(t)|q,(£)’, where is the number of elementary divisors 
different from 1. Thus the characteristic polynomial is the minimal 
polynomial if and only if r = 1 

Note 2. Assertion (11.16) can be regarded as an interpretation of the 
matrix equation (11.18), On the other hand equation (11.13) is a special 
instance of Eq. (9.13): 


9.13 A = bac det bh = +1 dete = +1 


where a isan m X x matrix with coefficients in KliJ, and A is the diagonal 
m X nmatrix of rank r: 


ait) «++ 0 OD ses O 
0 att) 0 0 
0 te 0 (0 0 


480 The Jordan normal form Ch. 18, Sec. 11 


and b, © are square m X m and n X n matrices with coefficients in K(f). 
Equation (9.13) has an interpretation similar to (11.16). Namely, let 
M denote the free K(f|-module K[é)", and let N denote the submodule gen- 
erated by the n columns of the matrix a. 


a= (aay... a) 


Equation (9.13) means in effect: upon replacing the base je7, ..., en} of M 
by the base {bf, . - . , b&} consisting of the column vectors of the inverse matrix 
bo, and upon replacing the submodule generators a, . . . , a, by the linear com- 
binations made up of the column vectors of the matrix ae, the submodule N be- 
comes simply generated by 


3121 gibi, @bi, . . . , gbr 


where qilg:| - - - |g- are the torsion orders of the matrix a. 
The proof consists of introducing the free module M, = K[tJ", the module homo- 
morphisms 
T* and &*: K{t" > Kir 
and the module automorphisms 
S*: Kl” — Kian 
R*: Kid" > KU" 


whose matrices are, respectively, 


aand A with respect to {e7,..., eR], fet... . , et} 
b with respect to fer, .. . , ef) 
€ with respect to {ef, .. . , ef} 


The submodule N is the image of T*, and the base (11.21) is found from the relation 


fqet, ©. . get, 0... , 0} = fa*(et), ... , A*(er)} 
= (S*-(TeR*) (ef), .. .  S*- (TYR*)(e5)} 


EXERCISES 


1. Deduce from the rational] canonical form theorem that every nilpotent endo- 
morphism of a finite-dimensional vector space is a direct sum of nilcyclic parts. 

2. Carry out a discussion of matrix equation (9.13), replacing the ring A[é} by 
the ring of integers Z. As a consequence prove 

(a) Every submodule of a finitely generated free Z-module has a base of the type 
(11.21). 

(6) Every finite abelian group is a direct sum of cyclic groups of orders q,, 
fy. +» d+ with qilgel - - + fg These numbers are called the forsion orders of G. 
Prove that the torsion orders of G are unique. 


Finitely generated abelian groups 431 


3, Let T be an endomorphism of a finite-dimensional vector space U of a field K, 
and let q denote the product of all the irreducible factors in the minimal polynomial 
of T. Assume that g and its derivative q’ have g.c.d. 1. (This is automatically 
satisfied if K is of characteristic zero or is finite.) 

(a) Prove that 9(T)" = 0 for some r. 

(b) Find a polynomial f(#) such that 

(q(T — f(T)" = 0 
[Hint: Expanding in Taylor's series, 

at — fa) = a — TOSOIoH) + --- 
and substituting { = T, the problem reduces to finding f(/) so that g(2)\(t — 
ere 

4, Continuing the notation of Exercise 3, let # denote the ring-endomorphism of 
KIe] given by 

tot ~ fiat) 
Prove the following: 

(a) g(0@) = a@)°A® (mod g(t)" 

() CEO) = a@)*f(t) (med g(t) 

(@(t) denoting @(@(4)); and f(t), fe(d) in KIA) 

(c) q(@*(E)) = 0 for 2 > r. 

5. Set S = 6(T), N = T ~§. Assuming that all the roots of q(¢) are in the 
field K, prove that § and N are the endomorphisms described in Theorem 6.3. 

6, Let $ be an endomorphism of a vector space U over a field K, and let q(t) be 
the minimal polynomial of S. Assume that q(f) has no repeated factors. (Such an 
endomorphism is called semisimple.) Prove: Any S-stable subspace V admits an 
S-stable complement W such that U = V @ W (direct). 


12. Finitely generated abelian groups 


We have already remarked that a module over the ring of integers Z is equivalent 
toan abelian group. On the other hand, one may note (cf. Exercise 2, See. 11) that 
the diagonalization process of Sec. 9 carries over bodily for matrices with integer 
coefficients if we replace the ring K[t] by the ring Z. As a result, one can assert the 
analogue of Note 2 above for the case of abelian groups. Thus, given a free abelian 
group (i.e,, a free Z-module) having a base of x elements, and given a subgroup N, 
one can find a base 


By Ty. + 4 Xa 

for M, and unique positive integers q, g@, . . . , ¢- such that N has a base 
MUX, For, - - Grek 

with qlg| «+ + Ide 


As a consequence, we have for the quotient group 


482 The Jordan normal form Ch. 18, See. 12 


wi M/NXZ,+2Z,+-°+-42,4+2+-::4+2 
Pro Te 
not 
Given now any abelian group @ with n generators, yi, . . . » Yu» we take a free 
abelian group M with n generators 7, . . . , n, and we map the free Z-module M 


onto the Z-module G by the mapping f: 


fiaow Ga rare) 


The kernel of / is some submodule N, and we have M/N = G. Applying the result 
(12.1), we find unique positive integers q1, a2, , 9, such that 


G=2,+---4+2Z,+Z2+---4+2 
CF te 


nor 


and qilg:| > + + lq 
For finite abelian groups we must have r = » here. For this case our result, in 
slightly different form, is given in See. 7, Chap. 10. (See Exercise 7 of that section.) 
The subgroup Z,, + - + - + Z,, is called the forsion subgroup of G—it may be 
characterized as the subgroup of G which consists of all the elements of finite order. 
The numbers q, . . . , g, which are greater than 1 are called the dorsion orders of 
G,. The number x ~ r of summands isomorphic to Z is called the rank of G. 


EXERCISES 


1. Prove that two finite abelian groups are isomorphic if and only if they have 
the same torsion orders. (Hint: Use induction after characterizing the largest 


torsion order.} 
2, Prove that two finitely generated abelian groups are isomorphic if and only if 
they have the same torsion orders and the same rank. 


14 


Quadratic and Hermitian forms 


1. Introduction 


In Sec. 11, Chap. 8, we took up briefly the question of defining lengths of vectors 
and angles between vectors. In the present chapter we shall go into this question 
more methodically. 

It was pointed out in Chap. 8 that in general it does not make sense to try to 
define length and angle in a vector space. Veetor-space axioms deal only with 
addition of vectors and with scalar multiplication. In order to define length of 
vectors it is necessary to impose an additional structure upon a vector space. We 
begin with some considerations which are closely related to the inner product of 
Eq. (11.4), Chap. 8. 


2, Linear functions; dual spaces 


Let V be a vector space over a field K. By a linear function on V is meant a linear 
mapping f: V > K. That is, f is required to satisfy 


at {fe +y) =f) +4 for all x, yin V 
f(ax) = af(x) for all xin V,ain K 

From this it follows that (0) = 0 and f(—x) = —f(x) for any x in V (See. 3, 

Chap. 9). 


Denoting by V* the set of all such linear functions, we make V* into a vector 
space over K by defining f + g and af by the rules 
(f+ )(%) = f(x) + g(x) fandgin V4, xin V 
(af) (x) = a- f(x) fin V*,ain K,xinV 
It is trivial to check that f + g and af are again linear functions on V, hence ele- 
ments of V*. We recall that V* is called the dual vector space of V (Sec. 4, 


Chap. 9). 
IfB = |e, ..., en} isa base for V, supposed of dimension n, and if x = z’e;is 


any vector in V, then for a linear function f we have 
f(x) = C(z'e) = x'f(e,) 
by (2.2). Write 


48h Quadratic and Hermitian forms Ch. 14, Sec. 2 
23 fi = fled 

so that 

2a f(x) = xf; 


Thus f is completely determined by the n scalars f,, .. . , fx. Conversely, given 
any n-tuple of scalars (f,, . . . , fa), the formula (2.3) defines a linear function f, 
plainly, In particular, taking f; = 0 for ¢ # j and f; = 1, we get a linear function, 
call it e’, for which (2.4) becomes 


28 e(x) =a 


Thus e? is none other than the mapping V — K that sends x into its jth component. 
relative to B. In particular, 


(0 ifiw] 
2.6 e(e:) = | 
‘ la iti=z 
The n linear functions e/ so defined form a base B* = fe’, .. . ,e"{ of V*. Forif 
fi... .,f, are scalars, then (fie*)(x) = fie'(x) = fiz’, by (2.2) and (2.5). There- 


fore, if the f, are defined by (2.3), starting from any linear function f, it follows that 
both f and f,e' send any x in V into the same element of K. Hence 


27 f = fe’ 


On the other hand, the e’ are linearly independent, as follows at once from (2.6). 
The base B* is called the dual base associated with B. In particular, dim V* = 
dim V. 

From See. 4, Chap. 9, we recall the notation 


28 if, x) = E(x) 


introduced in order to put elements of V and V* on a kind of equal footing in 
formulas. In this notation, formulas (2.3) to (2.7) become 


x= zie; f= fie 
at = (el, x) Ji = fe) 
29 (x) = fix! 
wep = {0 PI 
d t=j 


We recall the easily verified formulas (4.9), Chap. 9: 


(Ex + y) = (fx) + (fy) 
210 +gx) = (x) + (gx) 
(af, x) = (f, ax) = af, x) 


for any f, gin V*, x, yin V, andain K. These are simply (2.1) and (2.2). 


Linear functions; dual spaces 485 


Let us denote by ( , x) the mapping that sends f in V* into the scalar (f, x). 
From the second and third equations above it is clear that the creature { , x) isa 
linear mapping V* —> K, hence is an element of the dual space V** of V+. The cor- 
respondence x — (_, x) therefore gives as a mapping V > V**. It is a linear-map- 
ping, as is plain from the first and third equations above. If an element x is sent 
into zero by our mapping, that means (f, x) = 0 for all f, which is possible only 
for x = 0. Now if V has finite dimension n, then we know that dim V* = n, 
whence in turn dim V** =n. As was just pointed out, the kernel of V > V** is 
zero, and it follows that our mapping must be an isomorphism. Hence, V and the 
dual of V* are isomorphie in a natural way (if dim V # ©). For this reason we can 
simply abolish V** in order to minimize the number of vector spaces at hand; every 
element of V** can be unambiguously represented by an element of V, and vice 
versa. 

But in general the dual space V* cannot be swept under the carpet so easily. Of 
course V and V*, having equal dimension, must be isomorphic. However it is an 
important fact that in general there is no naturally determined isomorphism V — 
V+. Ifwe choose a base e, . . . , ¢, in V, then we can define an isomorphism T by 
the requirement T(e,) = e! (dual base), for example. But it is easily seen that T so 
defined depends strongly on the choice of base in V.¢ Some vector spaces have a 
distiriguished base, and if we agree in that case to define T by means of the distin- 
guished base, then it becomes fixed once for all, and V* becomes superfluous in the 
same way as V** above. 

This situation arises in particular with the space K” of n X 1 matrices (that is, 
column vectors), which has the distinguished base e,, . . . , e, in which e; is the 
ith column of the unit matrix 1,. Any linear function f on K" is completely deter- 
mined by the scalars f; = f(e:), according to (2.3), and f can be unambiguously 
represented by the n-tuple 


fi 


fa 
which is an element of K", Because of our index conventions, however, it is con- 
venient to think of this n-tuple as a row vector (fi, . . . , fa), consequently as an 
element of K, (we point out again that K” and K, differ only in notation). Thus 
K, can be thought of as the dual of K, and vice versa. 
As was already pointed out in Chap. 9, the index convention in V (lower indices 
on base vectors, upper indices on components) leads in a compelling way to the 


+ For example, if dim V = 1 and if we define T: V — V* by T(e) = e* (dual base of e 
in V), then replacing e by a new base ce (and e* by (1/c)e*) changes T to (1/e*)T. 


486 Quadratic and Hermitian forms Ch, 14, Sec. 2 


opposite convention in the dual space V*, These conventions are natural and con- 
venient. Moreover, they are the well-established conventions of tensor calculus. 
Various other index systems are in common use—for example (as in Chap. 8), 
keeping all indices as subscripts. But all index conventions have at some point or 
other a minor flaw, the result of which is to force the user to employ the transpose 
of a matrix where he did not expect to. We shall point out the difficulty in our 
system. It is one which will not cause us much concern. 

Let U, V be vector spaces over a field K, with dim U = mand dim V = 2, and 
let T: U — V be a linear mapping. If B, = {u,} and B: = {v;} are bases for U, 
V, then the corresponding matrix of T is a = (a‘,), where T(u;) = a/v, By Defi- 
nition 5.1, Chap. 9, the components a’;, . . . , a*, of T(u;) form the ith column of 
a, which is therefore x X m. 

Now if fis an element of V*, then f > T is an element of U*, being a linear map- 
ping U — K. Thus the correspondence f > f° T maps V* to U*. This mapping, 
denoted by T*, is linear and is called the franspose of T. It is characterized by the 
relation 


aaa (THE), ®) = UB, TO) xin U, fin V* 
(See Theorem 4.4, Chap. 9.1 
What is the matrix of T+? Let Bt = |u'] and BY = [v'] be the dual bases of 
Bi, Bs. We must compute the components of T*(v/) relative to the base BY in U*. 
According to (2.9) those components are the quantities 
(TA(w), ws) 
By (2.11), this is the same as 


(Ww, Ta) = (v4, akiv) 
a, va) = ay 


using (2.9). Therefore, 
zaz T#(W) = ai 


It would therefore seem natural to take (a’,) = a as the matrix of T* relative to 
the base pair By, BY. But that violates the convention of Definition 5.1, Chap. 9, 
according to which the matrix of T* relative to Bf, BT must have a4, a/, . . . , @n 
in its jth column. That is, the matrix of T* relative to Bf, Bt is the transpose ‘a. 
We state this as the following proposition: 


proposition 21 Let U and V be finite-dimenstonal vector spaces over a field K, and 
let T be a linear mapping U + V. Let a be the matrix of T relative to a base pair 
By, Bz in U, V, respectively, Then the transpose T*: V* — U* of T, mapping gin 
V* into g T, has ‘a as ils matrix relative to the dual base pair BI, Bt. 


Linear functions; dual spaces 437 


EXERCISES 

1. Let B = {e,) and B, = {¥,} be two bases in V(i = 1, .,7). Let p be 
the matrix from B to B,. That is, v; = p’ye,. Prove that ‘p—' is the matrix from 
the dual base B* to the dual base By. 

2, Let V be @ vector space of dimension z, and let B* be a base in the dual 
space V*. Prove that B* is the dual of some base in V. 

3. Let V be an n-dimensional vector space, and let V* be its dual. Let W* be 
an r-dimensional subspace of V*. Prove that all x in V such that (f, x) = 0 for 
every fin W* form an (x — r)-dimensional subspace U of V. Conversely, given 
such a U, show that al! fin Y* such that (f, x) = 0 for every x in U form an r-di- 
mensional subspace W* of V*. (U is called the orthogonal complement of W*, and 
vice versa.) 

4. V being a vector space of dimension 2, let fi, . . . , f be & linear functions 
on V. Let V! consist. of all x in V such that (f;, x) = 0 forj =1,...,%. Show 
that V' is a subspace of V of dimension >» — k, equality holding if and only if 
fi, ..., f, are linearly independent. Show that every subspace V' of V can be 


obtained in this way. 

5. Let B = {€, @:} be a base for a vector space V, and put B, = je — &, 
2e, +e}. Show that B, is a base in V provided that 1 + 1 + 1 = 0 in the field 
of scalars, and determine the dual base B7 in terms of B*. 


6. Let B= le:,..., en} be a base for a vector space V, and let BY = 
fel, ..., e”} be its dual base. Set g =e! + me? +--+ +0,e". Show that 
fg, e2, ... , e”| isa base for V*, for any scalars c,. Find the base in V of which 


it is the dual. 
7. Q@ being the rational field, let T: Q? — Q! be the linear mapping that sends 
a column vector x into ax, where 


fj 2 -1 tn) 
s-(3 4 2 
1 1 5, 


Where does the transpose T* of T map the vectors (4, 0, 3) and (1, 1, 1)? 
8. Let U, V be finite-dimensional vector spaces over a field K, and let S: V¥ — 
U*hbe a linear mapping of their duals. Prove the following: 
(a) There is a unique linear mapping T: U > V such that S = T*. 
(6) Tis a monomorphism if and only if $ is an epimorphism. 
(c) Tis an epimorphism if and only if $ is @ monomorphism. 
(@) rank T = rank S. 
9, Show that e; spans the orthogonal complement of the subspace of V* 
spanned by e',...,e Ve... et, 
10. If x is a vector in V such that (f, © = 0 for all f in V*, prove that x = Of 
dim V # &, 


438 Quadratic and Hermitian forms Ch. 14, See. 


il. Let B = [e,, e:, e;] be a base for a vector space V over Q, and let B* = 
fel, e?, e#} be the dual base. Set y' = e! + 2e’ + ev? = —e? + Sei, v = e! + 20% 
Find the base in V of which {v', v’, v} is the dual. 


3. Bilinear functions 


Again let V denote a vector space over a field K. By a bilinear function (or form) 
fon V is meant a mapping which assigns to every ordered pair of vectors x, y in V 
an element f(x, y) in K and which is linear in both x, y. In other words, denoting 
by V x ¥ the set of all ordered pairs of vectors x, y in V, fis a mapping V x V 
K satisfying the following conditions: 


f(x, ¥ + 2) = f(x, y) + flx, 2) 
a1 f(x +2, y) = f(x, y) + fz, ¥) 
flax, y) = f(x, ay) = a f(x, y) 


for any x, y, 2 in V and any ain K, 


Such 2 function is called symmetric if f(x, y) = f(y, x) for any x, y; fis skew- 
symmetric if f(x, y) = —f(y, x) for any x, y, 


ExamMPLE 1 Let f be any bilinear function on V. Define g and h by g(x, y) = 
f(x, y) + fly, x) and h(x, y) = f(x, y) — f(y, x). Then g is a symmetric bilinear 
function on V, and h is a skew-symmetric bilinear function. 


EXAMPLE 2 Define f on K” as follows: if x and y are any two vectors in K”, put 


fay yea tne bays Day! 
& 
It is quickly seen that this f is a bilinear function on K,. For R" we have already 
encountered it in (11.4), Chap. 8. 
The following definition shows how to construct bilinear functions from linear 


functions: 


DEFINITION 3.1. Let f, and f, be two linear functions on a vector space V. Then fi ® h 
will denote ihe bilinear function defined by 


32 (fh ® f(x, y) = GG) - f(y) = (fi, 0» fe, ¥) 


On the right is indicated a product of two scalars, It is trivial to verify that 
f, @ fr is 2 bilinear function. 

We denote the set of all bilinear functions on V by Vj. Just as with V* of 
Sec. 2, we can make V$ into a vector space over the field of scalars K by the follow- 
ing definitions of sum f + g and scalar product af: 


(f+ ois, y) = fx, y) + ey yD 


33 
(af\(x, y) = a: f(x, y) 


Bilinear functions 48g 
for any bilinear functions f, g, for any x, y in V, and any ain K. It is quickly 
verified that f + g and af defined in this manner are bilinear functions and that 


Vz thereby becomes a vector space over K. 
In particular, if fy, .. . yf 8... « » grare linear functions on V, we can form 


sums of the type 
ah @ gi) + ah @ &) +--+ - + aff, @ g) 


to obtain bilinear functions on V. We shall soon see that, if dim V < ©, then 
any bilinear function can be expressed in this way. 

Now let V be a vector space of dimension n over K, and let B = fe, - . . , en] 
be a base for V. For any bilinear function f on V, consider the scalars 


MM a; = Ee, e;) G@j=zl....n 
The a;; form an x X n matrix a = (a;;) in which the first index is understood to be 
the row index. 
DEFINITION 3.2. The » X 2 matrix a of (8.4) is called the matrix of £ relative to the 
base B. 

Ifx = xie, and y = y’e; are any two elements of V, then using (3.1) repeatedly 
we get 
48 f(x, y) = ficie,, we) = x'yif(es, @)) = aia'y? 


Conversely it is clear that if a is any » X n matrix with coefficients in K, then 
formula (3.5) defines a bilinear function f on V. From the definition of matrix 
multiplication we can write (8.5) in the form 


38 (x, y) = ‘Ray 
where & denotes the column vector made up of z', . . . , 2”, and similarly for . 


examece3 Let B = {e,, ..., &.| be the canonical base in K". That is, e; is 
the ith column of the unit matrix. Referring to Example 2, the matrix of the 
bilinear form z'y! + - ++ + 2"y" relative to Bis the unit matrix I,. 

It follows at once from (3.4) and (3.5) that f is symmetric if and only if its 
matrix a is symmetric, that is, a,; = a), or ‘a = a; fis skew-symmetric if and only 
if its matrix a is skew-symmetric, that is, ai; = —a;;, or ‘a = —a. 

Equation (3.5) can be put in slightly different form as follows: 


PROPOSITION 3.1 Lei a be the matrix of a bilinear function f relative to a base B = 


fe... , en} of the vector space V. Let BY = {e', . . . , e"} be the dual base in 
V*. Then 
46 f = ae' @ ef 


Conversely, if (3.6) holds, then a = (a;;) is the matrix of f relative to B. 


440 Quadratic and Hermitian forms Ch. 14, See. 3 
Proof. From (3.2) and (3.3) we have 
(aije’ ® e7)(x, y) = aij (e!, x) (e4, y) = arly? 
the latter by (2.9). The assertion follows from (3.5). Q.£.D. 

Let us now see how the matrix of a bilinear function f changes with a change of 
base. Let B’ = fe’, .. . , ef} be another base in V, and let a’ be the correspond- 
ing matrix of f. That is, ai; = f(ef, e'). Let p = (p',) be the matrix from B to B’. 
That is, 

RT ef = pre, 
Then 
fle, e}) = fiphien, phen) = phptfler, ex) 
using (3.5), and so 
as ai; = Bane’; 
As a matrix product this is 
38 a’ = ‘pap 
(Recall that A is the row index in p*;, and in a,., which accounts for the transpose 
in the first factor.) Observe that (3.9) is not the same as the rule for the change 
of the matrix of a linear mapping V > V. 

Since both p and ‘p must be nonsingular, it follows from Sec. 8, Chap. 9, that 
a and a’ have the same rank. Therefore we can define the rank of f as follows: 
DEFINITION 3,3 Let f be @ bilinear function on an n-dimensional vector space V. Then 
the rank of f is defined to be the same as the rank of the matrix of f relative to any base 
inv. 

Another interpretation of the rank of f can be given as follows: Let f(x, ) de- 
note the operation that sends any y in V into the scalar f(x, y). Then f(x, )isa 
linear mapping V — K, by (8.1), hence is an element of V*. We have therefore a 
mapping T,: V -> V* sending x into f(x, _), easily seen to be linear. Let us com- 
pute its matrix relative to the base pair B, B*. By definition, Ty(e:) = fle, ), 
and this sends e; into f(e,, e;) = a,; Therefore, by (2.9), 

3.40 Tiled = fe, ) = aye? 


From Definition 5.1, Chap. 9, the matrix of T; relative to the base pair B, B* is 
precisely ‘a. 

In a similar way we obtain a second mapping T.: V — V*, sending x into 
f(_, x), where this symbol denotes the operation that maps any y in V into f(y, x). 
Then 


aa Tre) = FC, e”) = ane! 


The matrix of T, relative to B, B* isa, The rank of fis therefore none other than 


Quadratic forms ht 


the rank of T; or T:, by Theorem 8.2, Chap. 9. Observe that T, = T. if f is sym- 
metric, 


REMARK, In Secs. 2 and 3 we have defined linear functions f(x) and bilinear fune- 
tions f(x, y) on a vector space. There is no reason for stopping at 2; in a similar 
way we can define 3-linear functions f(x, y, z) of ordered triples of vectors in V, and 
soon, This is indeed the subject of Chap. 16. Bilinear functions are examples of 
tensors on V. They are sometimes called dyadic tensors. 


EXERCISES 
1, Let V be a vector space of dimension », and let f', . . . , f* be a base for its 
dual space V*. Prove that the elements f' @ (i,j = 1, .. .,) form a base of 


the space of bilinear functions V3. Prove that dim V$ = 7%. 

2, Let f be a bilinear function on a vector space V over a field K, and suppose 
that 1 +10 in K. Prove that f can be written uniquely as a sum f’ + f”, 
where f’ is a symmetric bilinear function and where f” is a skew-symmetric bilinear 
function. 

3. Let V be a vector space of dimension x. Let V? be the space of bilinear 
functions on V. Show that the symmetric elements of V3 form a subspace U,, 
and show that the skew-symmetric elements form a subspace U;, Compute dim U, 
and dim U,, If 1+ 1 » 0 in the field of sealars, prove that V?# is the direct sum 
of U; and Uy. 

4. Let B = [e1, ex} be a base for a vector space V over the real field R. Let f 
be the bilinear function on V whose matrix with respect to B is 


1 2 
2-1 
Find a new base in V with respect to which the matrix of f is the unit matrix. 
Compute f(x, x), f(y, y), and f(x, y), where x = 8e, + e: and y = e. — 2e. 
5. Let g! , g” be linear functions on an n-dimensional veetor space V. Set 
feag @g+---+eg Og 
Prove that the rank of fis <r, and give a reasonable condition for equality to hold. 
6. Referring to Definition 3.1, under what circumstances is f, @ fr = f: ® f\? 
7. Let f be a bilinear function of rank x on a vector space V of dimension n 
(f is then called nondegenerate). Let U be an r-dimensional subspace of V. Let 
W consist of all x in V such that f(x, u) = 0 for every uin U. Show that W isa 
subspace of dimension n — r. 


4. Quadratic forms 


Let V denote a vector space of dimension x over a field K. If fis a bilinear function 
on V, then we get a mapping Q: V — K defined by Q(x) = f(x, x). Such mappings 
are of considerable importance and are the subject of this section. 


A442 Quadratic and Hermitian forms Ch. 14, See. § 


DEFINITION 41 A mapping Q: V — K is called a quadratic form on V if there is a 
symmetric bilinear function f on V such that Q(x) = f(x, x). We call Q the quadratic 
form associated with f. 

Let us work out some simple properties of such a Q, First of all, Q(ax) = 
f(ax, ax) = a(x, x), by (3.1). Hence 


aa Qlax) = PQ(x) 
In particular, 

42 Q(-x) = Ax) 
Further, using (3.1) again, 


Qk +y =fatyxty =faxty +fyx ty) 
= f(x,x) + f(y y) + Fy, x) + fly, ») 


Since f is assumed to be symmetric, there results 


43 Qe + y) = Q) +2- fy yp +e) 
where 2 denotes the element 1 + 1 of K. This equation shows that if1 +10 
(that is, if K is not of characteristic 2), then f is completely determined by Q. 
Sometimes f is called the polar form of Q. 

Replacing y by —y above, we obtain 


Q(x — ¥) = Q(x) - 2f, y) + QC) 
Therefore 
“4 Q(x + ¥) — Ox — y) = 4f(x, y) 


PROPOSITION 4.1 If @ is a quadratic form on a vector space V, then the function h 
defined by 


45 h(x, y) = Qe + ¥) — Q(x — y) 


is a symmetric bilinear function on V. Conversely, if Q is a mapping V — K for 
which (4.5) holds, if Q(0) = 0, and if 1 + 1 #0 in K, then Q ts a quadratic form 
on V. I! is associated with a uniquely determined symmetric bilinear function on V, 
namely, 14h. 
Proof. The first assertion follows from (4.4). For the converse, if 
141 = 0inK, then (1 + 1) 40. This element is denoted above by 4. 
We have 


Jah(s, x) = h(}gx, 19x) = Q(idx + }4x) — OO) 


by (4.1) and (4,5). Since @(0) = 0, by assumption, we get }4h(x, x) = 
Q(x). @ cannot be associated with any other symmetric bilinear function, 
by (4.4). @.6.D. 


Quadratic forms 443 


exampte 1 Consider the vector space K” of » x 1 matrices (column vectors). 
Let a = (a;;) be any symmetric n X n matrix with coefficients in K. Define @ by 


Q(X) = azgeixi 
where x is the column vector with entries x', .. . , 2". Then 


Q(x + ¥) — QO ~ y) = ailet + VG! t+ 9) — aia’ — YG - 9) 
2ayjaty! + Zaryiat 
= dayeiy? 


since @;; = a, Q is the quadratic form associated with the bilinear function f de- 
fined by 


f(x, y) = asjrty! 

From the definition of matrix multiplication, we have 
f(x, y) = ‘xay = ‘yax 

and 
QO) = ‘xax 


In particular, if a is the unit matrix, then this reduces to 


f(x, y) = ‘xy = tyx 


and 
Q(x) = fx = FP (ed? 
a 


We have already encountered these in See. 11, Chap. 8. 


exampce 2 Consider the mapping @ of R? to R defined by 
Q(v) = 4a? + Gay + Sy? — where v = (:) 
¥, 


This is the special case of the foregoing corresponding to the matrix 


The bilinear function associated with Q is given by 
f(y, v!) = dere’ + B(xy’ + 2’'y) + yy’ 


where 


Ahh Quadratic and Hermitian forms Ch. 14, See. & 
If we define addition and scalar multiplication of quadratic forms on V by 


(Q + Q(x) = Q(x) + BX) 
(Q(x) = 2- QO) 


then it follows easily irom Definition 4.1 that Q + Q’ and a@ are again quadratic 
forms on V. In this way we make the set of all quadratic forms on V into a vector 
space over K. 

The following definition shows how quadratic forms can be built from linear 
functions on V. 


Ag 


DEFINITION 4.2 Let f, and f, be two linear functions on a vector space V over K. Then 
by fils we denote the mapping V — K defined by 


ar (fife) (x) = f(x) + fo(x) = (fi, x)( fa, x) 


If 1 +1 #0 in K, then fife ig a quadratic form on V. If t = 
for any K. 

In fact, fify is none other than the quadratic form associated with the symmetric 
bilinear function 14(f © fy + fy ®@ &), assuming that 2 x 0 (see Definition 3.1); 
f,f, is the quadratic form associated with f, @ f,. 

Observe that fif; = ff, as follows at once from (4.7). For ff, we naturally 
write f%, ete. Using the definitions of (4.6), we ean build arbitrary linear combina- 
tions of such quadratic forms fify (assuming that 1 + 1 = 0 in K) to obtain other 
quadratie forms on V. 

Corresponding to Proposition 3.1 we have the simple result below. We recall 
that the matrix of a symmetric bilinear function on a vector space (of finite dimen- 
sion), relative to any base, is symmetric. 


2, then that is so 


PROPOSITION 4.2 Let f be a symmetric bilinear function on a vector space V, and let 
a be its matrix relative toa base B = |e, ...,@| mV. UB*= fel... 0) 
denoles the dual base in the space of linear functions on V, then jor the quadratic form 
Q associated with f we have 


a8 Q = a;e'e 


Conversely, given any symmetric n X n matrix a with coefficients in the field of scalars 

K, then (4.8) defines a quadratic form on V. If 1 +1 0 in K, thena is uniquely 
determined by Q and is also called the matrix of Q relative to B. 

Proof. Equation (4.8) follows at once from (3.6) and Definition 4.2 

above. For the converse, given a matrix a of the type stated, (3.6) defines 

a symmetric bilinear function f having @ as its associated quadratic form. 

Now no other matrix a’ can give the same f, by Proposition 3.1, and no 

other f can give the same Q, if 1 + 1 ¥ 0 in K, by Proposition 4.1. There- 

fore no two different matrices can give the same Q, if1 +10. @E.D. 


Quadratic forms ALS 


Equation (4.8) can be expressed in different ways. If x is any vector in V, and 
if its components with respect to Bare x', .. . , 2", then (e’, x) = 2!, by (2.9). 
According to (4.6), (4.7), and (4.8) we have then 


Q(x) = aj; Ce’, x) +(e, x) 
or 
43 Qe) = ajax! 


which is of course just (3.5) with x = y. Denoting by & the column vector whose 
elements are ', . . . , 2", we have the matrix form of (4.9), 


4.10 Q(x) = kak 
a special case of (3.5"). 
Remark 1. If 1 + 1 = 0 in X, then (4.8) reduces to 
am Q = Dale’? 
ces] 
For the mixed terms appear in pairs. For example, 
azele? + ane’el = (1 + l)aye'e® = 6 


since az = a, and e’e? = e’e'. Hence, all the mixed terms in (4.8) add up to 
zero. It follows that the uniqueness assertion of Proposition 4.2 is false in this 
case. 


examp.e 3 If1 +1 = 0in K, then we have 


wa) \Z)-¢+a+pa reer te 
1 ty 


and 


wale NG) <8 


for any vector () in K?, Thus the quadratic form 2? + 9? is given by two differ- 
¥, 


ent matrices. 


REMARK 2. Let f be any bilinear function on V, and assume that 1 + 1 #0 in 
K. By Example 1, Sec. 3, we can write f = g + h, where g is symmetric and h 
isskew-aymmetric. Then f(x, x) = g(x, x) + h(x,x), But h(x, y) = —h(y, x), 
whence h(x, x) = —h(x, x), or (1 + 1)h(x, x) = 0. Therefore h(x, x) = 0. This 
shows why we deal only with symmetric bilinear functions in the context of quad- 
ratic forms. 


446 Quadratic and Hermitian forms Ch, 14, See. 5 


EXERCISES 
1. Let Q be the quadratic form on K” defined by Q(x) = a2 + - - + + 2,? for 

x =‘@,...,2,). Find the matrix of Q relative to each of the following bases, 
assuming I + 1 # 0 in K. 

(a) e =(0,...,1,...,0) (1 in the ith place) 

(0) ff =... 452,...,0) (2 in the ith place) 

(c) =O... . (Zin the ith place} 

(@) by =e + ex, by = e: — @2, hy = @; for? > 2, 


(e) Taking x = 2, the base e} = 3e, — e2, e: = e: + der 
Verify (3.9) for case (e). 
2. Let R denote the real field and define 


Qi R2>R by Q(v) = ax? + 2bry + cy? 
where v = () and where a, b, ¢ are given real numbers. Find the matrix of @ 
¥, 


relative to the canonical base B of R?. 

3. With @ as in Exercise 2, suppose there are vectors v, w in R? such that Q(v) > 
Oand Q(w) < 0. Prove that there is a nonzero vector u in R? such that Q(u) = 0. 
Prove that there is a base B’ = {e}, e5} such that Q(v) = 2? — y® if v = 2’e, + 
ye, 

4. With @ as in Exercise 2, find necessary and sufficient conditions on a, 6, ¢ 
such that Q(v) > 0 whenever v # 0. 

5. Let @ be a nonzero quadratic form on a vector space over the complex field. 
Prove that there exist vectors x, y in the space such that Q(x) is a negative real 
number and Q(y) is a pure imaginary number ~ 0. 

6. Let Q be a quadratic form on a vector space V, and let T: V + V bea linear 
mapping. Define Q’ by Q’(x) = Q(T(x)). Show that Q’ is a quadratic form on V, 
and if dim V # © show how to compute the matrix of Q' from those of Q and T. 


5, Reduction to diagonal form 


Let V be an n-dimensional vector space over a field K, and let f be a symmetric 
bilinear function on V, with associated quadratic form @, That is, Q(x) = f(x, x). 
In general the matrix of f (or Q) relative to a base B in V will contain nonzero 
terms off the diagonal, giving rise to mixed terms in (8.6) or (4.8). In view of Re- 
mark 1 of the preceding section the mixed terms can be omitted from (4.8) if 
1+1=0in kK. 

Then, assuming 1 + 1 = 0 in K, we propose to show that it is possible to find 
a base in B for which the matrix of f and Q is a diagonal matrix. The method that 
we use comes from the ancient Babylonian algebraists and amounts to “completing 
the square.” 

Start with any base B = fe; .. . , e,] in V, and let a be the matrix of fand @ 


Reduction to diagonal form 44? 


relative to B. We naturally assume that f ~ 0, so that a contains at least. one 
nonzero element. 

To achieve our goal we are going to make a series of base changes, and for our 
purposes it will be most convenient to do that in the dual space V* of V. Our first 
step is to show that we can assume ay, + 0, after a possible preliminary base 
change, 

Suppose first of all that all the diagonal elements a,, of aare zero. That situation 
can be remedied as follows: Let a, be a nonzero element of a (h = k), and define 
a hew base B’ = ef, ... , ef} in V by 


e= e fori#k 
sa ek =e + ea 
We note that the dual bases B* = [e',... , e"}and B’* = Je", ... , e’"} are 
related by 
ef =e fori zh 
52 


et =e — et 
for it is easily checked that (2.9) holds for B’ and B’*, For the matrix a’ of f 
relative to B’ we have 

Qin = Fel, ef) = Fler + es, ex +e) 


flex, ex) + 2f(ex, en) + fer, en) 
= Aan + 2ain + Ain = 2dry HO 


since we have assumed die = @ax = 0 and ay, ¥ 0. 

We shall assume that this preliminary base change has been made, if necessary, 
so that a itself has a nonzero diagonal element ax. By interchanging e: and e; in 
the base B, we put that element in the 1, 1 location of the matrix. Therefore we 
shall assume that an = 0. 

We begin with (4.8) and we shall make a series of base changes in the dual space 
V* of V. Separating out all the terms in (4.8) containing e' we obtain 


Q= y Gere! 
em 


= ane? + 2lanele? +--+ + arnete”) + S> axele! 


See 
using the fact that a = an, ete. We can rewrite this as 
Q = ane! +e? +--+ + e,e") — alee? + +++ + e.e")? 


{Fe2 


53s = Gyan G=2.-.,7) 


448 Quadratic and Hermitian forms Ch. 14, See. 5 


Put 

54 aly = ai; — ane; GicBec.sn 
and 

5.5 vise poe? t+ + ce" 


Then it is quickly seen that 


Q = anlvit + YS ate'e’ 


oH 
Since the linear functions e', . . . , e" on V form a base for V*, it is clear that the 
set |v, e’,.. . ,e"} is also.a base for V*. 


If not all the a,,; are zero, then we can repeat the same procedure for the quadratic 
form 


= Saiee 


Thus, assuming that a% + 0 [otherwise a preliminary change of base as described 
above in connection with (5.1) and (5.2) is necessary] we can put Q’ in the form 


@ = avy + Yo ace 


tyes 


where v? is given by an expression of the type 


56 ¥ e+ oe +--+ +e,e" 


analogous to (4.5) but involving only e?,..., e’, The elements {v!, ¥, e, 
,e*} form a base for V*. 
Continuing in this way, it is plain that after at most  — 1 steps we shall arrive 
at a new base BY = {v4 v,... , v"} for V* for which @ has the form 


87 @ = Av)? + hv)? +o + dave)? 


Moreover it is clear from the steps above that if some d, is zero, then so are dit, 
diy2, ete. 


Tueorem $1 Lei f be a symmetric bilinear form on an n-dimensional vector space V 
over a field K in which 1 +140. Let @ be the quadratic form associated with f. 


Then there exists a base B. = |v, ... , ¥n{ for V with respect to which the matrix 
of f and @ is a diagonal matrix. If € has rank r, then By can he so chosen that its 
first r diagonal elements dy, dz, . . . , d, are nonzero, and then 

ss fa @vit--) +40 ov) 

and 


5.9 Qaag(vyr+-- + +d(vyP 


Reduction to diagonal form 449 


where 
BY = {v',... , w"} is the dual base of By 
Proof. Let BY = {v!, ... , v"} be the base for V* appearing in (5.7). 
Now B? is certainly the dual of a base By = {vi, . . . , ¥,} in V (see Exer- 
cise 2, Sec. 2). Applying Theorem 3.1 to (5.7) above, we see that the ma- 
trix of f relative to B, must be the diagonal matrix with d,, dz, .. . , d, as 
diagonal elements. From Definition 3.3 and the remark following (5.7) 
it is clear that dj = 0 forj >. Finally, (5.9) is just (5.7) with the zero 
terms omitted; (5.8) follows from (3.6). Q.E.D. 
If x, yin V have components x', .. . , 2" and y', . .. , y", respectively, rela~ 


tive to the base By, then from (3.5) and (4.9), applied to Bi, we have 
5.10 fx,y) =da2y' +--+ +dey 
sae Q(x) = d(x + oe + dry? 


Determination of a base in V having the properties stated is usually called diago- 
natization of the quadratic form Q, or reduction of Q to diagonal form. 


coronary 1 If K is the real number field R, then the base B, can be chosen so that 


d= lfori=t,...,sandd; = —Lfort=s4l,... sr, whereO Ss ert 
In particular, (5.9) becomes 
saz Qa (WE ee $e = et = WF 


Proof. If d; > 0 we simply replace the base vector vi by d;~' v; (and 
vi by d#v'). If di < 0, we replace ¥, by (—d,)—ty; and vi by (—d,)!v4. 
By suitably renumbering these vectors if necessary, we achieve the de- 
sired form (5.12). Q.E.D. 


conotiary 2 If K is the complex field C, then the base B, can be chosen so that 
db =1fori=1,...,4. 

Proof. Thereis a complex number h, such thath? = d;, (Theorem 3.3, 

Chap, 5). Replace vy, by ky 'v; for? =1,...,7, and vi by hiv, QED. 


COROLLARY 3 Let a be a symmetric n X n matrix with coeffictents in a field K in 
which 1 +120, Then there exists a nonsingular n x n matrix p with coefficients 
in K such that 


5.43 ‘pap 
is a diagonal matrix. 


Proof. We have only to take any base B = je, ..., @} in an 
n-dimensional vector space over K and then definef = a,je' @ e’. Apply- 


+ If s = 0 we understand that none of the d, is equal to +1; if s = r we understand that 
none of the d, is equal to -—1. 


450 Quadratic and Hermitian forms Ch. 1h, See. 5 


ing Theorem 5.1, we can find a new base B, for which f has a diagonal 
matrix. If p is the matrix from B to B,, then (5.13) is that diagonal 
matrix, by (3.9). @.E.D. 


exampce 1 Let V be a three-dimensional vector space over R, and let Q be the 
quadratic form on V whose matrix with respect to a given base B = {e1, €2, 9} is 


1 2 -3 
2 1 U 
-3 0 0, 


Then, by (4.8), 
Q = (el)! + dele? — belet +(e)? 
which we rewrite as 


Q = (e! 4 Be? — 3e!)? — (Ze! — Bet)! + (e2)? 
= (e! + Be? — Be8)? — 3(e2)? + 12e%e? — 9(e2)? 


Set 
Esty yl =e! + 2e* — 3¢3 
Then 
Q = (wp — 3(e’)? + 12ee% — 9(e%)? 
Repeating the process, 
—3(e*)? + 12e%e? — 9(e8)? = —3¢e — Zest + 4(e%)? — 9¢e8)? 
Put 
sas vt = VB(e" — 2e%) vi = Vier 
The result is 
Q@ = yt — Cy — 
the form promised by Corollary 1. The matrix from B* = fel, e%, e'} to BT = 


{vl, v, v9} is 


er) 0 
5.16 2 ve 0 
—8 -2Vv38 V5, 


The inverse of this matrix transforms from B to the base in V of which Bf is the 
dual. 


DEFINITION 5.1 Let Q be a quadratic form on a finite-dimensional vecior space over 
a field K in which 1 +1 #0. Then the rank of @ is defined to be the same as the 
rank of the symmetric bilinear function associated with Q (see Definition 3.3). 


Reduction to diagonal form 451 


Thus the integer r in Theorem 5.1 is also the rank of Q. If the matrix of @ 
with respect to some base in V is diagonal, then r is precisely the number of non- 
zero diagonal terms, for by definition the rank of Q is the same as the rank of the 
matrix of Q relative to any base in V, 


DEFINITION 5.2 Let f be a symmetric bilinear function on an n-dimensional vector 
space V over a field K in which 1 +1 0. Let Q be the quadratte form associated 
with f&. The set of all vectors x in V such that f(x, y) = 0 for every y in V is a sub- 
space N of V, called the nullspace of f or Q. 

Clearly N is precisely the kernel of the mapping T, of (3.10). We have there- 
fore, by Theorem 3.2, Chap. 9, the following theorem: 


THEOREM 5.2 The dimension of the nullspace of Q is equal to n — 1, r being the rank 
of Q. 

Tf x is in the nullspace of Q, then Q(x) = f(x, x) = 0, clearly, But there may 
be vectors x not in N for which Q(x) = 0. 

We now look into the integer s appearing in Corollary 1 of Theorem 5.1 above. 
Let B= {vi,..., ¥,} and Be = |w,..., Wx} be two bases, for each of 
which the matrix of Q is diagonal, consisting of +1 or 0. Let dh, . » d, be 
the diagonal elements in the matrix of Q relative to Bs, and let di, » a, be 
the diagonal elements in the matrix of @ relative to B,, Both sets contain exactly 
r nonzero elements, r being the rank of @. By suitably numbering the base ele- 
dj = 0 fori > rand that 


ments we can assume that d; 


a= { 1 fori 8 
-1 for 7 Tr 
a [ 1 foré=1,...,¢ 
{-1 forgst+1,...,7 
We want to show that s = i. Suppose then thats >t Wx =z'yp.-- + 
wy, then Q(x) = (x!)? + ++ + + (2')*) Hence Q(x) > Oif x is a nonzero element 
in the subspace L spanned by w,..., ¥s Similarly, Q(y) < 0 for any y in 
the subspace M spanned by Wii, -.-, Wn. Since dim L = s, dim M=n—t# 


it follows that L and M must have a nonzero vector in common if s > ¢, a contra- 
diction. We therefore have the following result, known as Sylvester’s law of 
inertia: 


THEOREM 5.3 Let @ be a quadratic form on an n-dimensional vector space over the 
real field R. If Q is reduced to diagonal form in any manner, then the number of 
positive diagonal elements and the number of negative diagonal elements depend only 
on Q, not on the particular base. 

In the notation of Corollary 1 of Theorem 5.1, the number s (or sometimes 
+ —s) is called the signature of Q. 


452 Quadratic and Hermitian forms Ch, 14, See. 6 
EXERCISES 
1. Let B = {e1, @, es] be a base for a vector space V over R. Define Q by 
Q(x) = 2x? — dey + Qee + By? — 2ye + 42? 
if x = xe, + yep + ze; Find a base in V for which @ has the diagonal form 


(5.12), What is the signature? 
2. Find a nonsingular matrix a such that ‘aba is a diagonal matrix, where 


(2) 


3. Work Exercise 2 with 


1 -1 2 
b=[(-1 3 oO 
2 0 4, 


In the following exercises, x; denotes the linear functions on R” that sends any 
column vector into its ith entry (@=1,..., 7). 
4. Deseribe the locus of x:x: = 1 in R®. That is, describe the set of elements 
of R? which are mapped into 1 by the quadratic funetion xix. 
5. Deseribe the locus of xxx ~ x,? = 1 in R*. 
6. Describe the locus of x)? + 2xix2 + 2xix; + 2xe? + 2x,? = 1 in RY 
In the following exercises determine whether there is 2 nonzero vector y in R” 
such that Q(v) = 0. 
7 Qa=xP tx? tet 
8. Q@ = x? +x? — xz in RY 
9. Q = x? + 2xixXe + 2mxy + 2x? + 2x,? in R% 
10. Q = x2 + 2xixe + Qmins + 2xe? in RS 
11. Find the nullspace of Q in Exercise 8. 
12. Find the nullspace of Q in Exercise 10. 


6. Hermitian forms; unitary mappings 
Throughout the rest of this chapter we deal exclusively with two special fields K: 


K = R, the field of real numbers 
K = ¢, the field of complex numbers 


For any element a in K (hence a is a complex number, possibly real) we denote 
by & the complex conjugate of a. The mapping a — 4 is an isomorphism of K to 
itself sending real numbers, and only those, into themselves (see Sec. 1, Chap. 5). 

As we have seen in Sec. 11, Chap. 8, a quadratic form of @ certain type on a 
veetor space over R gives a means of defining lengths and angles. A quadratic 
form on a vector space over C is not suitable for this purpose. For if @ is such 


a form and if Q(x) is a nonzero real number for some x, then eran x) - 


Hermitian forms; unitary mappings 453 


=1 Q(x), which is a pure imaginary number. Therefore if @ # 0, then Q(x) 
cannot possibly be a real number for all vectors in the space. We therefore make 
a small modification. 

Let V be a vector space over K (where K = Ror K = €). Denote by Vx V 
the set of all ordered pairs of vectors x, y in V. By a Hermitian formt H on V 
is meant a mapping V x V — K such that 


1  H(x, y) is linear in x for any fixed y 


6.2 H{y, x) = H(x, y) (complex conjugate) 
More explicitly, (6.1) means 


Hix +, y) = H(x, y) + Hy, y) 


6.3 
Hoax, y) = a- H(x, y) 
for any x, v, yin V and ain K. Using (6.2) and (6.3) we have 


Ho, y +w) = Hy +, x) = Hy, x) + Aw, x) 
H(x, y) + H(x, w) 


Thus 
oa Hw y+ w) = W(x, y) + AG w) 
Further, 
H(x, ay) = Hay, x) = a HG, 
=4-HG,® 
whence 


6s Hx, ay) = a+ H(x, y) 


li K =R, it is clear that H is simply a symmetric bilinear function. If K = €, 
it differs from a bilinear symmetric function only by (6-2) and (6.5). 


exampre 1 Let x, y denote elements of K" (column vectors). Then 
H(x,y) = lf +P $s + og 
is a Hermitian form. We call it the carfesian Hermitian form for K*. If K = R, 


this is the same as the bilinear form of Example 2, See. 3. 
The point of Hermitian forms is the following: 


56 H(x, x) is a real number for any x. 


For by (6.2) we have H(x, x) = H(x, x)- H is called positive definite if H(x, x) > 0 
for all x # 0. 

It should be apparent that most of the things worked out in the foregoing sections 
for symmetric bilinear forms will hold, with minor modifications, for Hermitian 


+ After the French mathematician Charles Hermite (1822-1902). 


454 Quadratic and Hermitian forms Ch. 14, See. 6 


forms. We shall point out the main facts briefly, mentioning again that, for K = 
R, they are merely repetitions of the corresponding statements in the preceding 
sections. Most proofs are left to the reader. 

If H is a Hermitian form on an n-dimensional vector space V, and if B = 


fe, ..., @.} is a base for V, then the matrix a = (a,;) of H relative to B is 
defined by 
a7 ai; = Ale, e;) 


From (6.2) we have 


68 ay = A or ‘aaa 
A matrix with this property is called Hermitian. If K = R, this is just the condi- 
tion for a to be symmetric. 

If x = xe; and y = y’e;, then 
69 Hx, y) = aspuige 

If ef = pie, (i = 1, .. . , n) form a new base B’ for V, then the matrix a’ of H 
relative to B’ is obtained from a by 
616 a’ = ‘pap 

We define the rank of H to be the rank of its matrix relative to any base in V. 
Thus H has rank » if and only if det a ~ 0. 

Tf H is positive definite, then its rank must be equal to dim V. For if deta = 0, 
then there exist nonzero numbers z',... , 2" such that a,‘ = 0. But then 
H(x, x) = 0, by (6.9), for the vector x = zie. 

Now let g be a linear function on V. By g we denote the mapping V — K 
defined by 


eax B(x) = 8G) 


Tf f is another linear function on V, then by f§ we denote the mapping V — K 
defined by 


612 (FB)(x) = E(x) BCX) = £00) BO) 
With this notation we have 


6.13 


where B* = {e!, . . . , e*] is the dual base of B above. 


ExamPLe 2 Let e' denote the linear function on K" sending a column vector x into 
its ith component x'. Then the cartesian form of Example 1 can he written 


eu He=elé t sss + emer 


The matrix of this form with respect to the canonical base B = {e, .. . , en} of 
Kv is the unit matrix I,, This form is visibly positive definite. 


Hermitian forms; unitary mappings 455 


A Hermitian form H on an n-dimensional space V can be reduced to diagonal 
form by essentially the same argument used in Sec. 5 for bilinear forms. For exam- 
ple, take the case n = 2, for which (6.18) is 


Ho= ane’ + ayel® + ane’é! + ane’é? 


By a preliminary change of base in V, as described in Sec. 5, we can arrange things 
so that an = 0, Recall that a,; = 4;.; in particular, a, and az are real. Now put 


vise! + ce? with ¢ = an/an 
Then 


ayv'¥! = an(e! + ce*)(@! + 2&) 
= ayelé! + ayce’é! + andelé® + anete’e’ 


By definition, auc = an, and we have ané = @n¢ = Gy = ay (since ay is real). 


Hence, 

ayv'F! = aye'é + ane! + ayele + acters? 
Hence, 

HE = auv'#" 4 (am ~ ancé)e’e? 


Asimilar argument holds for x > 2, and it is easily seen that one can find a base 


v,..., v" in the dual space of V such that 
H-dv4.-- 4407 
where r is the rank of H, and where ds, . . . ,d, are nonzero real numbers. Putting 


ui = Vd,viif d; > Oand ui = V—d,viif d; < 0, we then obtain (after a suitable 
renumbering, if necessary) 


ess Hawi 4+.) fp was — eta — — war 


The number s is called the signature of H (or of the matrix of H relative to any 
base); and s does not depend on the particular base change used to put H in the 
diagonal form (6.15). 


H again denoting a Hermitian form on an n-dimensional vector space V over K 
(= Ror = €), a linear mapping T: V — V is called H-unitary if 
sis H(Tx, Ty) = H(x, ¥) 


for all x, y in V, where we have written simply Tx for T(x), etc. (For the case 
K = R the term H-orthogonal is often used instead of H-unitary.) 

Let B = {e, . . . , en} be a base for V, and let a be the corresponding matrix of 
H. Further, let ¢ = (ct) be the matrix of T relative to B, so that Te, = cle, 
Putting x = e; and y = e; in (6.16) we obtain 


H(chiex, eye.) = H(ei, e;) 


456 Quadratic and Hermitian forms Ch. 14, See. 6 


or 

eat cathy = ay; 
or 

em ‘caé = 8 


It is easy to verify conversely that if (6.18) holds, then T is H-unitary. If H has 
rank 2, that is, if det a ~ 0, then from (6.18) we conclude that det ¢ # 0, and so T 
must be a one-to-one mapping, that is, an isomorphism of V. Then T-' is defined, 
and from (6.16) applied to T-'x and T-'y in place of x and y, one sees immediately 
that T-is also H-unitary. Furthermore, if S and T are both H-unitary, then so is 
SeT. For 


H(Se Tx, Se Ty) = H(Tx, Ty) = H(x, y) 


by (6.16), applied twice, 

Therefore, if H is nondegenerate (i.¢., if its rank is equal to dim V), then all the 
H-unitary mappings of V to itself form a group, with composition of mappings as 
group operation, That group is called the unitary group of H.t 

As a particularly important special case, suppose that the matrix of H relative 
to the given base in V is the unit matrix, as in Example 2 above (if H is positive 
definite, then such a base can always be found by the diagonalization procedure 
outlined above). Then (6.18) boils down to 


6.19 é = 1 


Taking complex conjugates of both sides, we have 


since I is a real matrix. Thus (6.19) can also be written as 


6.20 


A complex matrix satisfying this condition is called a unitary matrix (or sometimes 
an orthogonal matrix, if ¢ is real). 


exampLe 3 The matrix 
cosa —sina 
sin a 08 a, 
is an orthogonal matrix for any real number a. 
Since a matrix and its inverse commute, (6.19) gives us 
'eé = @e =] 


Written out these equations become 
{If K = R, it is also called the orthogonal group of H. 


Hermitian forms; unitary mappings 4a7 


fl) (ifiek 
om yy 0 ifizk 
and 
en 


These relations are sometimes described by saying that the columns (or the rows) 
of ¢ are mutually orthogonal unit vectors, We shall go into this terminology in the 
next section. 


yHeorem 61 Let H be a Hermitian form on a vector space V. A linear mapping 
T of V to itself is H-unitary if and only if 


6.23 H(Tx, Tz) = H(x, x) 
for alix in V. 


Proof. From (6,16) it is clear that (6.23) must hold if T is H-unitary. 
Suppose now that (6.23) holds. Apply (6.23) tox + y. Using (6.1) to 
(6.5) we have 


H(T(x + y), T(x + y)) = H(Tx, Tx) + H(Px, Ty) + A(Ty, Tx) 
+ H(Ty, Ty) 


The left side here is equal to H(x + y, x + y), by (6.23). Expanding this 
out and using (6.23) on the right of the expression above, one gets 


624 H(x, y) + Hy, x) = H(Tx, Ty) + (Ty, Tx) 


In the case of the real field, H is symmetric, and the last equation re- 
duces to 


2H (x, y) = 2H(Tx, Ty) 


from which (6.16) follows at once. For the case of the complex field, re- 
place x in (6.24) by ix, where i = V—1. Using (6.3) and (6.5) one obtains 
easily 


H(x, y) ~ H(y, x) = H(Tx, Ty) — H(Ty, Tx) 
Adding to (6.24) we obtain (6.16). @.E.D. 


ExampLe 4 Let H be the Hermitian (or quadratic) form on the vector space R? 
defined by 


~ 
=2py forve= 
Hi) =ee ty fore (¢) 


Let T be the linear mapping defined by 


458 Quadratic and Hermitian forms Ch. 14, See. 6 
_ fa Wy) _ faz t a) 
Ty = ¢ oG) = C + dy 


H(Tv, Ty) = (ax + by)? + (ex + dy? 
=(@ + ex? + 2(ab + edjay + (+ By? 


Then 


For T to be H-unitary (or orthogonal) it is necessary and sufficient that 


@+erat 
ab+ed =0 
R+e=al 


[These are just Eq. (6.22) for this special case.) From the first equation a? + c = 1 
it follows that ~1 <a < 1,sincea, care real. We can then find a number @ such 
that a = cosa, c = sin a. From the other equations it is easy to deduce that 
5 = —sina,d = cosa. Hence, the matrix of T is of the type of Example 3, The 
H-unitary mappings can be thought of as rotations about a point in the euclidean 
plane. 


exampte s Let H be the Hermitian form 
F(x, y) = aly! + ty? + aby! 


on the vector space R*. The H-unitary transformations can be thought of as rigid 
motions about a fixed point in euclidean 3-space. 


example ¢ Let H be the Hermitian form on R‘ defined by 
Ax, y) = vy! ~ xty? ~ xtyt ~ atyt 


The unitary group of H is called the Lorentz group. It is of great importance in 
relativity theory. 


EXERCISES 
1, Let H be a Hermitian form of rank r on a vector space V of dimension » over 
the field of complex numbers. Prove that there is a base {x', x’, ... ,x"} in V* 
such that 
Fi(v, w) = ayx'(v)xi(w) bos e+ + atyx"(v)x7(w) 
where 
a= 1 


2, In Exercise 1 prove that the number of a; equal to +1 is independent of the 
particular base. 

3, Let B = fei, ex, e:} be a base for a vector space V over C, and let H be a 
Hermitian form on V. Let its matrix relative to B be 


Euclidean vector spaces 459 


L q i 
9 2 B4i 
-t $-i -1 


Find a new base in V for which the matrix of H is diagonal. 


7, Euclidean vector spaces 


In this section we shall go briefly into matters discussed in Sec. 11, Chap. 8. By a 
euclidean vector space E we mean a vector space over the field K of real or complex 
numbers on which there is prescribed 2 positive definite Hermitian form H. The 
sealar H(x, y) is called the inner product of the vectors x and y in Z. Instead of 
writing H(x, y) we shall often use the simpler notation (x, y). We recall the defi- 
nition: 
i (x, y) is linear in x. 
42 (% ¥) = (% *) 
Ley (x, x) > O and (x, x) = 0 if and only if x = 0. 

For any element x in EB we define the length |x| of x by 
1 Ix] = VO, 9) 
By (7.3) we have |x| > 0 unless x = 0. 


eExampLei The vector space R, becomes a euclidean vector space if we define the 
inner product (x, y) of two row vectors x = (m1, ...,%,) andy = (#,. ~~ Ya) 
by 

Y= BM to + eee 


exampLe 2 The vector space C,, becomes 2 euclidean vector space if we define the 
inner product (x, y) of two row vectors by 


(@@y) = mt ++ + tebe 


If x and y are real, then this is the same as in the foregoing example. 
We shall refer to these inner products in R, and C, as the sfandard inner products. 
A linear mapping T: E — £ of a euclidean vector space is called unitary if 


{Tx, Ty) = (x, y) 


for all x, y in E. (The term orthogonal is often used if the scalar field is R.) If Eis 
infinite-dimensional, one adds the condition that T maps E onto E. 


THEOREM 7.1 (Schwarz inequality) For any two vectors x, y in BE, 
as lay yl < [xl - ly 


equality holding if and only if one vector is a scalar multiple of the other. 


460 


Quadratic and Hermitian forms Ch. 14, See. 7 


Proof. The argument is essentially the same as that for Theorem 11.1, 
Chap. 8, except that (x, y) may not be areal number here. First of all, if 
(x, y) = 0, (7.5) clearly holds. Assume then that p = (x, y) is not zero, 


and set x’ = gx, where q = lel We have |q| = pl 1, clearly. Then, 


P [pt 
using (7.1) 
IG pl = Ie yl = la- Oe yl = lal Oey)! = 1, pl 
From (7.1) and (7.2) we have 


(x, 7) = (ax, ox) = 96x, x) = |@)? - [xf 


Ive 
and so |x‘|? = |x|’. Therefore the right and left members of (7.5) remain 
unchanged if we replace x by x’. The advantage of this is that (x’, y) is 
real. For 


Oe, y) = (x,y) = 9 (x,y) = ap = IDI 


by definition of p and g. Hence (x’, y) = (y, x"), Now let ¢ be any real 
number. We have 


(w’ + ty, x’ + ly) 20 

by (7.3). Expanding out the expression by use of (7.1) and (7.2) we obtain 
(x, x) + Ux, y) + Hy, x) + PG, y) 20 
By what was just said, this is the same as 
(x, x’) + 2ttx’, y) + Ply, y) 20 

or 

|x’? + 2éx’, y) +&-lyP 20 

Set. 

a=ly? b= 2(x,y) c= |x’? 
These are real numbers, and we have 
a+b te>ro 


for all real numbers ¢, Therefore the quadratic on the left cannot have two 
distinct real roots. From the quadratic formula it follows that 


b - dac < 0 
or 
4(x', y)? — dix’? - ly? <0 


or 


Euclidean vector spaces 461 
Ge’, yP Ss [xP Ly? 
whence 
lex’, yl < |x] Lyk 


which is what we wanted to prove. Suppose now that equality holds in 
(71.5). If p = (x, y) = 0, then |x| + |y| = 0, and so either x or y is the zero 
vector. If p ~ 0, then by our calculations above we have b? - 4dac = 0. 
In this ease af + bf + ¢ has two equal real roots f, and so (x’ ~ fey, 


The con- 


x! — fey) = |x’ — toy|? = 0, whence x’ ~ foy = 0, or x 
verse is trivial to prove. Q.E.D. 


As an immediate consequence of Schwarz’s inequality we have the following 
theorem: 


THEOREM 7.2 (Triangle inequality) For any vectors x, y in the euclidean vector 
space B, 


18 Ix + ¥1 < Ix] + lyl 
Proof. We have from (7.1), (7.2), (7.4), 
Ixt+tyP = (x+y x + ¥) 
(x, x) + Oy ¥) + (YX) + (9, 9) 
(a) +16, 91 + 1, Ol + ¥) 
Ix? + 2/(x, y)| + ly? 
S |x? + lx! -|y| + ly? (Schwarz inequality) 
= (xl + ly)? 


Taking square roots we obtain the conclusion. Q.E.D. 


HAG 


Just as in Sec. 11, Chap. 8, we can use the Schwarz inequality to define angles 
between two vectors in a euclidean space E, Thus, let x, y be two nonzero vectors, 
Then the number 


& y) 
1x1 [yl 


is real, and —1 < a < 1, by the Schwarz inequality. Therefore there is a uniquely 
determined number @ such that 0 < @ < m and 


¢ 
a=lg- 


= real part of 


cosd =a 


This number @ is defined to be the angle between x and y and will sometimes be 
denoted by « (x, y). 

In the case of a euclidean vector space Z over the real field, it follows at once 
from the definition that 


WwW (x y) = [xl - [yl eos % Ox, yd 


462 Quadratic and Hermitian forms Ch. 14, Sec. 8 
For a vector space over the complex field one has 
78 ¥3((x y) + (y, 8) = [allyl - cos £ (x, ¥) 

For any euclidean vector space it is easy to deduce the law of cosines: 
78 |x + yl? = ‘xl? + lylt + 2|x1-lyl- cos £ (x, y) 
This is equivalent to the following identity, which holds for any three vectors 
u, x, y such that u # xandu = y: 
7A0 Ix — y? = [x — uf? + ly ~ ul? ~2- |x ~a!-ly — ul-cos X (x — a, y — u) 
The number |x — y| is called the distance between x and y. 


EXERCISES 
1. For any vectors y and w in a euclidean vector space E, prove: |v| ~ |w] < 
lv — wh. 
2. Prove the law of cosines identity (7.9). 
3. Prove identity (7.10). 
4, In a euclidean vector space, the notion of angle ean be defined explicitly in 
terms of length. What is this definition? 
5. Let v be a unit vector in a euclidean space £, that is, |v| = 1. For any vector 
win E, we set 
Proj, u = (u, ¥)¥ 
(called the projection of uon v). Prove 
Projy (th + uu, + ++ + + u,} = Projy a + Projy uw +--+ + Projy Us 
Interpret this geometrically. 
6. Let £ be a euclidean space, and let T: EB — E be a mapping such that 
ITv| = |v 


for all vin E. Prove that T is linear and hence unitary. 


8. Orthonormal bases 


Throughout this section we deal with a euclidean vector space over the field K, 
where either K = Ror K = C€. The inner product of two vectors x, y of E will be 
denoted by (x, y). 


DEFINITION 3.1 Two vectors x, y of E are called orthogonal if (x, y) = 0; they are 
called perpendicular if 


Real part of (x, y) = 0 


If K is the real field, then these two notions are clearly the same. But if K is the 
complex field, they are different. For orthogonal vectors are certainly perpendicu- 
lar, but perpendicular vectors need not be orthogonal. For example, cos < 


Orthonormal bases 463 


(Vly, ¥) = 0 if ¥ #0, so that X (W—ly, v) = 4/2. But (W—1y, vy) = 
Va |v #0. 


DEFINITION &.2 Two subspaces U, V of E are said to be orthogonal if (u, ¥) = 0 for 
allu in U and v in V. 

For any subspace U of E we denote by U+ the set of all vectors in E which are 
orthogonal to every element of U. It is quickly verified that U+ is a subspace of E. 
It is called the orthogonal complement of U. 


DEFINITION 3.3 A set of vectors W1, Un, Us, etc., ts called orthonormal if 


(Qu, u) =O foriag 


"tu = 1 for each i andj 


In particular, if # has finite dimension n, then a base {e, . . . , en} is called 
orthonormal if (8.1) holds for these vectors. 


PROPOSITION 8.1 Orthonormal vectors are linearly independent. 
Proof. Suppose that am + +--+ +a, = 0, the a; being orthonor- 
mal. Take the inner product of the left side with wy: 


(ay + +++ + Gey, ur) = (0, mm) = 0 
or 
(Ut, U2) + ae(Ma, th) + ++ + + ar(U, tH) = 0 


By (8.1) this reduces to a, = 0. Similarly, replacing u on the right by u., 
wegeta, =O(k =1,...,7). QED. 
proposition 2.2 If u is a nonzero vector in E, then 


1 
voroeu 
ful 


is a unit vector; that is, || = 1. 


Proof. (v, v) = Gi su, ii . ) = a -@,u) = 1. 


We say that v is obtained from u by normalization. 


THEOREM &3 If, .. . , v, are orthonormal vectors in E and if x lies in the sub- 
space which they span, then 


82 x= wew bes +O Wee 
Proof. By assumption, x =v + +++ + ¢,¥, for certain sealars ¢;, 
Then 
(x, ve) = (Cv, bor + Cree ¥4) 


= (Vi Va) Hot ts heel Mey Ve) Hoe tb Orley Fa) 
= Cy Q.E.D, 


464 Quadratic and Hermitian forms Ch. 14, See. 8 
theorem se If m, us, Us, ... are Hnearly independent vectors in E, then the 
vectors vi, ve, Vy... given by the formulas below form an orthonormal set which 


spans the same subspace of E as the u;: 


ele. 


n= 
ye = ee W 
a CR 

— Us — (Us Vi) v1 ~ (Un ¥2) + 
vs — AM SO A Se 


© us = (us, v1) + v= (is, ¥2) - 


Uy = (ey V1) ¥1 = (iy Va) 2 = 


- Cm Ehret 
[ue — (ks Vi)¥E ~ (ry ¥e)¥2 ~ 


= (Wis Ve-1) Ye 


Proof. In each denominator stands the length of the vector in the 
numerator. Therefore, by Proposition 8.2, the v’s will all be unit vectors, 
provided that the numerators are not zero. Consider v;. From the 
formula, it is a linear combination of uy. and vj, v2... view By @ 
simple induction it follows that the numerator in v; is a linear combination 
of uw, uz, .. . , U,, hence cannot be zero because of the assumed linear 
independence of the latter. Therefore the formulas above all make sense; 
their denominators cannot be zero. It is also clear that u, is a linear combi- 


nationof vi, .. . ,¥:, showing that {u;, ... ,ucpand{y, ... , vs} span 
the same subspace of E, for any k. To show that the v’s are orthonormal 
we use induction, We show that the set vivo, .. . , vi is orthonormal 


for any & (that is, any & not exceeding the number of u; given to start 
with), For & = 1 the assertion is trivial. Suppose it holds for & — 1, so 
that vi, ..., ¥r1 are orthonormal. Denoting the denominator in 
the formula for y; by a,, we have 


O,¥e = Uy — CY) = Ve Cee 
where ¢; = (uy, v,). Then, for j < k, 

Qn (Ves Vj) = (ky ¥j) ~ O1(¥ty VA) mv Cea Vey ¥,) 

By assumption, the only nonzero term on the right, after the first, is 
—e,(¥y ¥,) = 7 = —(un, ¥) 


Hence (a.vi, ¥;) = 0, and so (¥, v)) = 0 for j <k. As already noted, 
(vi. v4) = 1. Hence wy, . .. , ¥, form an orthonormal set, and by induc- 
tion this is true for any k. Q.E.D. 


REMARK. This procedure for obtaining an orthonormal set from any linearly 
independent set of vectors is called the Schmidt orthonormalization process. 


Orthonormal bases ABS 


As an immediate corollary we have the following theorem: 


THEorem 3.5 If the euclidean vector space E has finite dimension, then E has an 


orthonormal base. 


a3 


ae 


Proof. One has only to apply the Schmidt process to any base for E. 


Suppose then that dim # = n, and let fe: . . . ,e,} bean orthonormal 
base. Let 
x = we; y= ye; 


be two vectors in E. We have 
(% y) = (ees y) = xen ¥) 
But 
(ey) = Ge) = We, e) = yilen e) 
= Hiei, e;) 

Therefore, 
(yaa te par 

It follows that if #’ is another euclidean vector space over the same 
field (R or C), andife{, . . . , e, isan orthonormal base, then the mapping 
T: B — E' defined by 
T(z'e,) = ze} 
is an isomorphism compatible with inner products. That is, 
(2, 9) = (Te), TH) 
the right-hand side being the inner product in E’ of the image vectors 
T(x), Tiy). We conclude that any two euclidean vector spaces of the same 
dimension and with the same field of scalars are isomorphic, in the sense that 
there is an isomorphism from one to the other which preserves vector 
lengths and inner products. 


THEOREM 3.6 Let U be a finite-dimensional subspace of a euctidean vector space E. 
Then EB = U @ U+, the direct sum of U and its orthogonal complement. 


as 


Proof. Let {e, ..-,@-} be an orthonormal base for U (one exists, 
by Theorem 8.4). For any x in £ set 
x = (x ener t +--+ + (% ee, 
xv =x-x' 


Then x’ is in U, and 
(x", e;) = (x — x’, e;) 
= (%, @;) — (x’e;) 


= (x, @) - Yo ex) (€x @,) 
mo 


= (x, €;) ~ (% e;)(e, e;) = 0 


466 Quadratic and Hermitian forms Ch. 14, Sec. 8 


Therefore x” is orthogonal to @, . . . , e, hence to every vector in U. 
That is, x” is in U4. It follows that every x in B can be expressed as a 
sum of a veetor in U and a veetor in U!, But U and U! have only the 
zero veetor in common. For if x isin both U and U4, it must be orthogonal 
to itself. That is, (x, x) = 0, or [x] = 0; hence x = 0. Therefore, by 
definition (see Sec. 5, Chap. 8) E is the direct sum of U and U+. qe. 


The theorem says that each element x of E can be expressed uniquely as a surn 
x =x’ +-x"\with x’ in U and x” in U1, The element x’ is called the projection 
of xon U. The element x” is called the perpendicular from x to U. Equation (8.5) 
gives explicit formulas. Since (x‘, x”) = 0, we have 


(x, x) = (x +x, x” 4 x") 
= RD Oe 0) (mY A OR, 2) 
= XD + (0 
and so 
as {xP = {x’lF + |x”? 
Purthermore, x’ is the element of U that fs nearest to x. For if y is any element 
of U, then by the law of cosines, 


Ix — y? = |x — x) + OF! - YP 
= |x — xP + Ix’ — yl? + 2lx — x/t |x’ — yl eos L(x — x’,x’ — y) 
= |x ~ xP + fx ~ yl? 
sinee eos {(x — x’, x’ — y) = 0 (x — x’ is in Ut and x’ ~ y isin U). 
Henee, |x — yf? > fx ~— x’? and so |x — y| > Ix — x‘. 


EXERCISES 
Here K denotes either the real or complex field, and 1, . . . , e, denote the 

columns of the n x ” unit matrix. K" is to be given its standard inner product. 
1. Show that (e,, . . . , 2} is an orthonormal base. 


2, Taking x = 8, let B = fe + e + es e: + ex es}, Compute the base that 
is obtained from B by the Schmidt orthonormalization process. 

3. Solve Exercise 2 for the base B = {5e:, de: + 3e:, 2e, + 3e, + L7eq}. 

4. Let n = 8 and K = R. Find the element x in the plane spanned by 


1 3 
0 and 2 
9, 1, 


3 
that is nearest to the element ( 


) Compute the distance between them. 
1, 

5. In R; find a nonzero vector which is perpendicular to both (a:, cp, a3) and 
(by, 52, bs). Hint: Expand the following two determinants by their first columns: 


Fourier series, Bessel’s inequality 467 


aq a by poo a bh 
@ a2 be and by ap by 
a dy bs 3 ag by 


6. In R,, prove that the shortest distance from an element (1, p, w) to the plane 
with equation ax + by + cz = 0 is given by 


au + bv + cw 
Ve +e te 
7, Let B = fu, ... ,u,} bean orthonormal base in an n-dimensional euclidean 
space, and let B’ = {v, . . . , ¥,} be another base. Prove that B’ is orthonormal 


if and only if the matrix from B to B’ is unitary. 
8, Prove that if the column vectors of an n X # matrix form an orthonormal set 
in K*, then so do the row vectors, in K,. 


9. Fourier series, Bessel’s inequality 


Here we shall look briefly into some questions of importance in connection with 
infinite-dimensional spaces. 

First of all, let F be any euclidean vector space, and let m, us, Us, ete., be an 
orthonormal set of vectors in E. As usual we denote the inner product in E by 
(x, y). For an x in E the numbers 


&e = (x, Wi) 


are often called the Fourier coefficients of x relative to the set {u,, th, . . .J. The 
reason for this terminology will be explained presently. Wee first state the follow- 
ing theorem. 


Tneorem 9.1 The quantity |x — (au + - + + + cnu,)| has its smallest value when 
cy. «+, are the Pourter coefficients of x: ¢: = (x, ui). 

Proof. This has been proved in the previous section. For let U/ be the 

subspace of Z spanned byw, ... 8, Then EZ = U ® U!. The vector 

y=cm +--+ +m, is in U for any scalars ¢,, of course, and we are 


seeking to determine the ¢; in such a way as to minimize the distance 
|x ~— yl. Wesaw in See. 8 that this is minimum if and only ify = (x, m) +m 
ts + Oy Un) Ue OED, 


THEOREM 9.2 (Bessel’s inequality) The Fourier coefficients x, of x satisfy 


Sle? < lx 


fost 
Proof. We have clearly 


(X= ey ~~ Matty XM + — eu) BO 


468 Quadratic and Hermitian forms Ch. 14, See. 9 


Expanding out the left side, we get 
(xx) — Do as (us x) ~ D9 () + D7 2; (wus) 2 0 
5 T 7 


Now (u;, x) = (% uy) = 2; Using the orthonormal property of the u;, 
our expression above becomes 


(x, x) ~ Yai - > aac + Yani 20 
F T 


or 
(4x) 2 xy lea? 
T 


This holds for any value of n, from which the assertion follows. @.E.D. 


Now let F denote the set of all continuous real-valued functions on the interval 
—x <x <wotreal numbers. We make F into a vector space over R by defining 
addition and scalar multiplication in the usual way. That is, (f + 9)(a) = 
F(z) + g() and (cf\(e) = ¢- f(z) for any f, g in F and cin R. Furthermore we 
make F into a euclidean vector space by introducing the following symmetric 
bilinear form on F: 


1 pe 
sa (a) = 2 fF) oe) ae 
7 Jae 
The associated quadratic function (/, f) is positive definite, since 
1 ps 
1 2 
= fo dr > 


for any continuous funetion except the constant function 0. Therefore the form 
(f, 9) makes F into a euclidean space. The functions in the set 


fa . . A \ 
B = |S, cos x, sin x, cos 2z, sin 22, ... , cos na, sin mx, .. wt 
\v2 


are all in F, and from the rules of integral calculus it is quickly verified that they 
form an orthonormal set. 


Let U, denote the subspace of # spanned by B, = {le cos 2, sin zt, ... 5 
cos nz, sinnz}. Thus dim U = 2n+1. By Theorem 9.1 the element f, of U, 
which is nearest to a given element f of F, that is, for which the distance 

WP ~fl = ¥0— fa FI) 


= vate - 


Fourier series, Bessel’s inequality 469 


is a minimum, is the function 

1 ' 5 
22 fo = 0+ Yg tmeoss + bsine + +++ +a, cos nz + b, sin az 
where the a’s and b’s are the Fourier coefficients of {/ That is, 


w= (ods) <1 fps 


5 fl Fe) cos ke de 


93 a, = (f, cog kan) 


by = (f, sin kx) = if f(z) sin ke dx 


Bessel’s inequality says that 
v7] ye = an Say de > Yak + 5° be 
wine k=l 


Now in general, a sequence of elements v,, v2, v3, etc., in a euclidean vector space E 
is called a Cauchy seguencet if for any positive real number ¢ there is an integer p 
such that 


I¥n — ¥ml <e for all m,n > p 


A sequence vi, ¥2, ¥2, . . . in E is said to have an element v of E as a limit if for 
any positive number ¢ there is an integer p such that |y — v,| <e for all x > p. 
The sequence is said to converge to v. It is easily shown that any sequence in E 
which has a limit must be a Cauchy sequence, and the limit is unique. The space 
E is said to be complete if every Cauchy sequence in E has a limit. 

Let us apply these remarks to our function space F. First of all, it is not hard 
to show that F is not a complete space. However, if f is an element of F, and if 
f, is the function (9.2), then it is a standard result that the sequence fo, fi, fo, - - - 5 
fn... ig a Cauchy sequence in F having f as its limit. Under certain circum- 
stances (for example, if f is continuously differentiable) it is possible to write 


fa) =a; + 2, a, cos nx + 6, sin nx 
co 


the infinite series on the right converging in the sense defined in Chap. 4. The 
series is called the Fourier series for f.. The f, above are simply the partial sums of 
the Fourier series. 

Without entering into details we mention that F can be “completed.” That is, 
by allowing functions f from a more extensive class, one obtains a complete eu- 
clidean vector space H containing F as a subspace. The space H in question is 


} Compare this definition of a Cauchy sequence with those in Chaps. 4, 5. Note that ‘“‘dis- 
tance” is the determining factor here. 


470 Quadratic and Hermitian forms Ch. 14, See. 10 


an example of a Hilbert space. Such euclidean vector spaces are of great importance 
in many applications of mathematics, for example to quantum theory. 


Let G denote the set of continuous real-valued functions on the interval -1 < 
az <1. We can make G into a euclidean vector space, just as with F above, by 
defining the inner product of two elements of G by 


(9) = f° fea) de 


By applying the Schmidt orthonormalization process to the functions 
aS late, 2... 2... 
one obtains a sequence of polynomials Po(z), P(x), Pa(x), . . . which play a role 


in G analogous to that of the system Bin F, The P,,(x), apart from some numerical 
factors, are the Legendre polynontials, 


EXERCISES 

1. Prove that a finite-dimensional euclidean space is complete. 

2. Prove that if a sequence of elements in a euclidean space has a limit, then it 
is a Cauchy sequence. 

3. Let ao, a1, bi, as, ds, ete., be real numbers, and define f, by the formule (9.2). 
Prove that the f, form a Cauchy sequence if and only if the series )” (a, + bi?) 

a 

converges. 

4. Compute the first four polynomials Po, Py, Ps, Ps deseribed above. Prove 
that P,(x) is a polynomial of degree n for every n = 0, 1, 2, ete. 


10. The eigenvalues of a Hermitian matrix 
Let a = (ais) be an x X x Hermitian matrix. We recall that this means that 
wa ay = G@fal...,a 


In particular, .:, 80 the diagonal elements are real. Equation (10.1) can be 


written as ‘a = 


THEOREM 10.1 The eigenvalues of a Hermitian matrix are real. 

Proof. Let p be an eigenvalue of a. That is, p is 2 root of the poly- 
nomial y(t) = det (1 — a). By assumption, det (pk — a) = 0, and so 
there ig a nonzero vector u in ¢* such that (pI — a)u = 0, or 

10.2 au = pu 


(See corollary, Theorem 5.3, Chap. 11). Now for any x, y in ¢” define 


A(x, y) = ‘fax = 


The eigenvalues of a Hermitian matrix 471 
This is a Hermitian form on C", as is quickly verified, using (10.1). There- 
fore A(x, x) is real. Now multiply (10.2) on the left by the row vector 
‘@, getting 
‘jaw = p+ ‘iu 
or 


A(u, u) =p: fu =p. D> de =p yo lei? 


tat fat 
or 


10.3 p= oo 


This shows that p is real. Q.E.D. 
corottary The eigenvalues of a real symmetric n X n matrix are real numbers. 
For such a matrix is a special case of a Hermitian matrix. 
We recall that an endomorphism T of a euclidean vector space E is called self- 
adjoint if 
10.4 (Tx, y) = (x, Ty) 
for all x, yin E. Assume now that £ has dimension 7, and let B = [e, . . . , en} 


be an orthonormal base in E, Let a = (a/;) be the matrix of T relative to the 
base B. Then by definition T(e;) = a’e;. By (8.2) we have 


105 © a, = (Te, @)) 
Similarly, 
a’; = (Te, €:) 


= (ei, Te) using (y, x) = (x y) 


=(Te,e) — by (10.4) 
= a; by (10.5) 


Hence the matriz of a self-adjoint endomorphism relative to an orthonormal base is a 
Hermitian matrix. 


tHeorem 102 Let T be a self-adjoint linear mapping of a finite-dimensional eu- 

clidean vector space E over the field of real or of complex numbers. Then ihere is an 

orthonormal base in E consisting entirely of eigenvectors of T. Moreover, the eigen- 
values of T are real. 

Proof. According to Definitions 5.1, 5.2, Chap. 11, the eigenvalues of 

T are the same as the eigenvalues of the matrix of T relative to any base 

in E. As we have just seen, the matrix of T relative to an orthonormal 


10.9 


Quadratic and Hermitian forms Ch. 14, See, 10 


base is Hermitian, therefore has real eigenvalues, by Theorem 10.1. This 
proves the last assertion of the theorem. 

To prove the rest of the theorem, let pi, po, ... , p, be the distinct 
eigenvalues of T. Denoting by I the identity mapping of E, the operator 
pil also maps E to itself, whether the field of sealars is R or C, since p; is 
real, Let V; denote the kernel of the operator p,l — T. That is V; con- 
sists of all vectors x such that (pl ~ T)(x) = 0, or T(x) = pjx. In other 
words, V; consists of all eigenvectors belonging to the eigenvalue p;; V; 
is a subspace of Z. First of all we show that 


V, and V, are orthogonal subspaces if j # k. 

We must show that the inner product (x, y) is zero if x is in V; and y is 
in V,, This is very easy. First of all, by definition we have Tx = p,x and 
Ty = psy, and p; # pr. By (10.4), 

(Tx, y) = (x, Ty) 
whence 

(B;x, ¥) = (% Pay) 

or 

PilX, ¥) = Dil ¥) 
where on the right we have used the fact that p, isreal. The last equation 
can be written 

(pi — pe(x, y) = 0 
and so 

@y) =90 

Now set 

Vevit +¥, 


That is, V consists of all vectors v in E such that v can be expressed as a 


sum 

veut--+ ty, withwinv; G=L...,7) 

It is clear that V is a subspace of E. Furthermore it is T-stable. For 
applying T to (10.8) gives us 

Ty) =Ti) +--+ + TO) =p t+: tay 


Since each p,v; is in V,, it follows that T(v) is in V. 
Now let E’ denote the orthogonal complement of V. By Theorem 8.6, 
Eis the direct sum 


E=VGE 


The eigenvalues of a Hermitian matrix 473 


From this it follows that E’ is also T-stable. For let w be an element of 
E’. By definition, (v, w) = 0 for every vin V. By (10.4), (vy, Tw) = 
(Tv, w). Since Ty is in V, we have (Tv, w) = 0, whence (v, Tw) = 0, 
showing that Tw is orthogonal to alt vectors in V, hence is in B’, 

We next show that E’ consists of the zero vector alone. For let T’ de- 
note the restriction of T to E’. If dim E’ > 0, then T’ must have at least 
one eigenvalue q; and accordingly there is a nonzero vector w in EB” such 
that T’w = gw, But 1’ is the same as T on E’, and so Tw = gw. Hence, 
w is an eigenvector for T, and so q is an eigenvalue for T, therefore must 
be the same as one of the p;, say py. But then wisin Vi, hence in V, which 
contradicts (10.9). It follows that dim E” cannot be greater than zero. 

Hence, from (10.9) and (10.7) we now have 


wre R= Vit+---+ +¥, 
Finally, let B; be an orthonormal base for V;(j =1,..., 1), and let 
B = {Bi ..., B.} be the combined set of vectors obtained by putting 


all the B; together. From (10.6) it is immediate that B is an orthonormal 
set; from (10.10), B must span E; from Proposition 8.1, the vectors in B 
are tinearly independent. Therefore B is an orthonormal base for E, and 
each vector in B is an eigenvector of T since each vector of B is in one of 
the V;. QED. 


corontary 1 Lei g(t) be the characteristic polynomial of T, and let 


10.12 el) = — prt ~ mye + pr) 


be its factorization, where pr, Ds, . . . » pv are the distinct eigenvalues of T. Then E 

is the direct sum of the subspaces V;= Ker (pl ~T),f =1,..., 1; and dim 
Vi =n. 

Proof. The fact that =Vid--- V, follows at once from the 

observation that the joint base B = {Bi, . . . , B,} isa base for E, where 

B; is a base for Vj = 1, ..., 7). We have then dim E = dim V, + 

- +dim V, and also dim E =m+--- +7, But dim V, < x, 


by the corollary to Theorem 5.6, Chap. 11. From these relations it is 
clear that dim V; = 2; QED. 


coronary 2 Let a be a Hermitian n x n matrix. Then there is a unitary matriz 
¢ such that 


cae =p 
where p is a diagonal matrix with real coefficients. 


Proof, Let T: * — C" be the mapping that sends a column veetor x 
into ax, C” being equipped with its standard inner product 


waz (yp HOP te $y 


474 Quadratic and Hermitian forms Ch, 14, See. 11 


Then a is the matrix of T relative to the canonical base By = fe, ..., 
e.}, where e; denotes the jth column of the  X n unit matrix; and T is 
self-adjoint for the inner product (10.12). Furthermore, Bo is an ortho- 
normal base in €* for (10.12). By Theorem 10.2, there is an orthonormal 
base B = {m, ...,u,} in €* such that each u; is an eigenvector of T. 
The matrix of T relative to B is therefore 2 diagonal matrix p with real 
entries. Let ¢ be the matrix from By to B. Then p = cWac. QED. 


REMARK 1. Applying (6.20) we see that the equation e-'ac = p cun also be 
written ‘€ac = p. Furthermore, if a happens to be real, hence symmetric, we 
can replace C" in the argument above by R", and it is then clear that the matrix ¢ 
will be real, hence orthogonal, 


REMARK 2, It should be noted that the column vectors of the matrix ¢ form an 
orthonormal! set of eigenvectors for the Hermitian matrix a. 


11, Simultaneous diagonalization of two Hermitian forms 


From the results of the preceding section we can easily deduce an important 
theorem concerning Hermitian forms, We begin with an auxiliary result. 


THEOREM 111 Lei F and H be Hermitian forms on a finite-dimensional vecior space 
V over R or C, with H nondegenerate. Then there exists a linear transformation T 
of V to itself such that F(x, y) = H(Tx, y) for all xand y. Furthermore, T is self- 
adjoint for H; that is, 

ACTx, y) = HO, Ty) 

Proof. Let B = {u,...,0,} beabasein V. The matrices of F and 

H relative to B are then given by 

ana ai; = Fu,u) and by; = H(a, u;) 


asin Sec. 6. By assumption, the matrix b = (b,,) is nonsingular. That is, 
det b #0, Write a = (a;;) for the matrix of F, and define e by ¢ = 
(ab7), or 


a= ‘ch 


If we write the elements of ¢ as c‘;, then according to our row and column 
conventions the last equation is the same as 


12.2 ay = Chiby; 


Define T to be the linear transformation such that 
To; = cm 


Then, for x = 2'n,and y = y'u; we have 


Simultaneous diagonalization of two Hermitian forms 495 
H(Tx, y) = H(@ietin,, yu) 

aiek i + H (tr, W)) 

vcd § = wiaig? = F(x, y) 


using the Hermitian property of H, along with (11.1) and (11.2). Finally, 
for any x, y, A(x, Ty) = A(Ty, x) = Fly, x) = FQ, y) = H(Tx, y), 
showing that T is self-adjoint for H. Q.E.D. 


THEoREM 11.2 (Principal-axis theorem) Let F and H be Hermitian forms on a 
finite-dimensional vector space V over R or C, with H positive definite. Then there ts 
a base in V for which the matrices of both F and H are diagonal matrices with reat 
entries, and in fact such a base can be found for which the matrix of H is the unit 
matrix. 

Proof. V with the form H is a euclidean vector space, since H is a 
positive definite Hermitian form. Let T: V — V be the linear mapping 
of the preceding theorem, so that F(x, y) = H(Tx, y). Since T is self- 
adjoint, it follows from Theorem 10.2 that there is an orthonormal base 


B= {w,...,u,} in V consisting entirely of eigenvectors of T, say 
Tu: = pai =1,..., 2). For the matrix of H relative to B we have 
1 ifi=7 
Hw, uy) = 8; =! is: 
Mem = br 19 tages 


since B is orthonormal. For the matrix of F we get 


PO, uw) = H(Tu, a) 
= Hui, uj) 
= pH (us, uj) = pds; 


Hence the matrix of A relative to B is the unit matrix I = (6,;), and the 
matrix of F relative to B is the diagonal matrix whose entries are the 
eigenvalues pi, . . . , pn of T, necessarily real numbers. @.E.D. 


For the base B just described, if x = xu; and y = y/u,, we have H(x, y) = 
H(ciu,, y’aj) = 29H (uy, uy) = 5,29 or 


1,3 H(z, y) = cgi t+. -) 429" 
Similarly, 
ma FQ Y) = piel pe tpg 


To illustrate the geometrical meaning of Theorem 11.2, consider the vector 
space R,, of real row vectors. As we have seen in Sec. 4, a quadratic form on Q isa 
mapping R,, — R given by an expression 


mas QQ) = > aiizir} 


tt 


476 Quadratic and Hermitian forms Ch. 14, See. 11 


where x = (a, ... , a) and where a = (q;;) is a symmetric real matrix. With 


@ is associated the symmetric bilinear form 


a6 Finy) = aug; 


Pat 
Introduce the standard inner product 
a7 &y) = Vz 
a 
From the results of Sec. 5 we know that there is a base in R, for which (11.6) re- 
duces to a diagonal form (see Fig. 1). Theorem 11.2 adds the information that 


Figure 1 The locus of the equation @ = 1 with @ the quadratic form (11.5) is called 
a quadric surface. One can choose orthogonal coordinate axes along principal axes so 
as to reduce the equation of a quadric surface to 


a? et x 
of * az * eqa7t 
+ aa. x x? xe 
(A) Ellipsoid: aitagt a =1 


(B) One-sheeted hyperholoid: 


(C) Two-sheeted hyperboloid: a 


be) 


a 


Simultaneous diagonalization of two Hermitian forms Ay? 


such a base can be found which is orthonormal. If B = fm, ... ,u,} is a base 
with this property, and if we denote the components of a veetor x = (m1, .. . 5 2a) 
relativeto Bbyzi, ...,«/,sothat x = zim + --- + ylu,, then (11.6) reduces 


to the form (11.4), and 
QQ) = pt toss + pati? 
Since the base is orthonormal, we have 
InP =a tees ta? = ah tes + mit 


Let us consider the locus defined by Q(x) = 1 for the case n = 3. Then our 
equation can be written 


QO0 = pin? + pe? + por = 1 


For purposes of illustration suppose that the p; are positive (so that @ is positive 
definite). We can then write p: = 1/a,°, where a; is a positive real number, and 
our equation becomes 


ath? xe\? cay 
G) + Gs) +G@)-1 
This is the equation of an ellipsoid, with semi-axes equal to a1, a», a;. The base 
vectors U), Ue, Us give the directions of the principal azes of the ellipsoid. (If the 
three eigenvalues p,, p», p; are different, then the principal axes are uniquely 
determined. If two or more of them coincide, then that is no longer the case. 
For example, if they are all equal, then our equation above is the equation of a 
sphere.) Observe that the line in R, determined by the vector u; is the same as 
the locus x; = 0, 23 = 0. The same is true for the other two axes of the ellipsoid. 


EXERCISES 
1, Let @ be the quadratic form on Rs defined by (11.5), with matrix 
1 2 0 
2 1 -8 
0 -3 2, 
Deseribe the locus Q(x) = 1 in Ri. 
2. Let a be a symmetric x Xm matrix with real coeifieienta, Let uy... 4 te 
be a set of orthonormal eigenvectors of a (why does such a set exist?), and let 
q = (wu, ... ,u,) be the matrix having these vectors as its columns. Prove that 


is an orthogonal matrix and that ‘qaq is a diagonal matrix whose entries are the 
eigenvalues of a. (The matrix q is sometimes called the modal matrix of a.) 

*3, Let a be as in the preceding exercise. Show that the largest eigenvalue of 
ais equal to the greatest value of ‘xax for all wnit vectors x in R”. 


478 Quadratic and Hermit 


ian forms Ch. 14, See. 12 
12, Unitary matrices 


We mention here a few useful facts concerning unitary matrices. Recall first that 
a matrix a with complex coefficients is called unitary if 
124 ‘asa 


(A real matrix with this property, that is, ‘a = a7!, ig called orthogonal.) 
By a discussion which closely parallels the proofs of Theorems 10,1 and 10.2 one 
can easily demonstrate the following facts: 


12.2 The eigenvalues of a unitary matrix are complex numbers of absolute value 1. 


12.3 If T is a unitary mapping of a finite-dimensional euclidean space E over the 
field of complex numbers, then there is an orthonormal base in E consisting 
entirely of eigenvectors of T. Moreover, the eigenvalues of T have absolute 


value 1. 
124 If ais a unitary matrix, then there is a unitary matrix u such that 
utau = 


where d is a diagonal matrix whose diagonal entries have absolute value 1. 
The proofs of these assertions are left as exercises. 


Exampce Let T be a unitary mapping of a finite-dimensional euclidean vector 
space FE over C, Consider the sequence of linear mappings 


12.5 T Tz 2. .,7Th 6 
where 
1 
T =f tT+-.+- +7) 
Does the sequence (12.5) converge to some limiting linear mapping? We can an- 
swer in the affirmative as follows: According to (12.3) we can find an orthonormal 
base B in E for which the matrix of T is a diagonal matrix d with diagonal elements 


Pu Px +--+ Px Such that |p,| = 1. It follows that the matrix of T;, relative to the 
base B is the sum 


Ed+at-.. ¢ay 
This is again a diagonal matrix, the ith diagonal element, being 


1 . 
pater: tpt itp; #1 


ifp, =1 


Vector products in oriented 3-space 479 


Henee, since |p,{ = 1, we have 
ad en fl ifpe=t 
fim gd tpt FPO = 9 itp #1 


It follows that 
_ 1 ot 
Jim Q4+T+ +. +7 


exists, its matrix relative to B being a diagonal matrix with ith entry equal to 1 or 0 
according to whether 


pal or pi 1 


This same result, in the case of a complete infinite-dimensional euclidean space, 
is a celebrated theorem known as the mean ergodic theorem. It has important 
applications in statistical mechanics. The proof involves some increased technical 
difficulties but is based on essentially the same idea as the foregoing argument. 


EXERCISES 

1. Write out 2 proof of (12.2). 

2, Prove (12.3). 

3. Prove (12.4). 

4, Let M, denote the set of allx x » matrices with complex coefficients. Define 
an inner product in M, by (a,b) = Tr(a‘h), where Tr is the mapping that sends 
any matrix in Af, into the sum of its diagonal elements. Prove that M, is a eu- 
clidean space of dimension n’ over C. 

5. With M, as in the preceding exercise, consider the sum 

aattat geet se +p 
for any ain M,. Prove that a, a», a3, . . .80 defined form a Cauchy sequence in 
M,: Since M,, being finite-dimensional, is complete, this Cauchy sequence has a 
limit, denoted by exp a. 
6. Let b be a nonsingular matrix in M,. Show that 
exp (bab~') = b(exp a)b™ 

7. Show that if a is a Hermitian matrix in M,, then exp a is a positive definite 
Hermitian matrix. 

8. Let a be a skew-Hermitian matrix in M,. That is, ‘a = 
exp a is unitary. 


Prove that 


13. Vector products in oriented 3-space 


The present section deals with some observations which are important for many 
physical applications, We deal here exclusively with vector spaces over the real 
field R. The questions to be discussed here are taken up in a more general form in 
Chap. 16. 


480 Quadratic and Hermitian forms Ch. 14, See. 13 


First of all, everyone is familiar with the notion of orientation in the plane or in 
3-space. For example, by common convention, a coordinate system (x, y, 2) in 
physical 3-space is called right-handed if a right-handed screw placed along the 
z axis, and rotated through 90 deg in the sense from the x axis to the y axis, moves 
in the positive direction along the z axis. 

The question of orientation is very important for many purposes, and we now 
show how the concept may be transcribed into mathematical terms for any finite- 
dimensional vector space V over R. 


Let B = lw, ...,u,{ and B’ = {v, .. . , v,} be two bases for V. 
Then they are connected by a certain » X » matrix a = (a/;): 
13.1 vy = walt; 
The inverse relation 
18.2 uw; = 9; 


is given by another matrix b = (bi), and the two matrices are inverses of each 
other: ab = ba = I, From general rules for determinants we have 


(det a)(det b) = 1 
Hence either det a and det b are both positive or are both negative. 


EFINITION 13,1. Let V be a finite-dimensional vector space over R. By an orientation 
of V is meant a mapping h which assigns to each base B of V a number h(B), either 
+1 or —1, én such a way that if a is the matrix from B to B', then 


13.3 A(B) - b(B’) - deta > 0 


It is easy to see that there are exactly two orientations of V and that if k is one of 
them, then —h is the other. For select a fixed base By in V, and start off by de- 
fining &(Bo) = +1, say. We claim that this determines h(B) uniquely for any 
base B in such a way as to satisfy (13.3). For if ¢ is the matrix from By to B, then 
(18.8) requires ua to have h(By) - h(B) - det ¢ > 0, or A(B)- det ¢ > 0, since we 
have put h(Bo) = 1. Hence we set h(B) = 1 if det ¢ > 0 and A(B) = —1 if 
dete <0. In this way k(B) is determined for all bases. To show that (13.3) holds, 
let B, B’ be two bases, and let ¢, c’ be the matrices from B, to B and from By to B’, 
respectively, As just observed, we must have h(B) - dete > 0 and h(B’) - dete’ > 
0. Therefore h(B) - &(B’) (det ¢} (det e’) > 0. Now the matrix a from B to BY is 
easily seen to be equal to ce’. Hence 


A(B) - h(B’) - det a = h(B) - A(B’)(det e“!)(det e’) 
= h(B) - h(B’)(det ©)-'(det ¢’) 


= iB) npr, SOG 5 g 


Vector products in oriented 3-space 481 


If we had started off with k(By) = —1, then all the signs 4(B) would have been 
reversed. This proves the statements made above. 

By an oriented vector space V we mean a finite-dimensional space over R 
equipped with one of the two possible orientations. In general there is no reason 
for preferring one orientation to the other. But we observe that for the space R” 
there is a natural orientation, namely, that one which assigns +1 to the canonical 


base {e1, . . .  @n], where as usual e; denotes the jth column of the x X » unit 
matrix. 

B={u,..., un} being any base in V, if we set B’ = {-u, u,..., wf, 
then the matrix from B to B’ has determinant —1, and so from (23.3) we see that 
4(B) and h(B’) must have opposite signs. If n > 2 and we set B’ = {uo, wy, Us, 


, Uy}, then again k(B) and k(B’) must have opposite signs. 

An orientation A on a vector space splits the bases into two classes, one class con- 
sisting of bases B for which k(B) = +1, and the other class consisting of bases B 
for which k(B) = ~1. Equation (13.3) says that the matrix connecting two bases 
in the same class has a positive determinant. The bases for which h(B) = +1 can 
be thought of as analogous to right-handed coordinate systems in physical 3-space. 
But as we have remarked above, there is in general no way of singling out a pre- 
ferred orientation in a vector space, and therefore there is no way of deciding which 
class of bases should be called right- or left-handed. 


Now let E be an oriented two-dimensional euclidean vector space over R. We 
shall see that the orientation, denote it by h, enables us to define the notion of posi- 
tive and negative angles. Namely, let u and v be two nonzero vectors in B. In 
See. 11, Chap. 8, and again in See. 7 of the present chapter we defined the angle @ 
between u and v by 


wa cosg@ = (0S <7) 


whete (u, ¥) denotes the inner produet in E. 

We can now attach a sign to @ as follows. If v is not a multiple of u (otherwise 
@ = Qor@ =n), then {u, v} isa base for E. We now define the oriented angle 
(a, v) by 


13.5 (u,v) = A({u, ¥}) 8 


where @ is as in (13.4) and where A({u, v}) is the number +1 or —1 which the 
orientation h assigns to the base {u,v}. If we reverse u and v, then k changes sign, 
and therefore 


13.6 Xv, u) = -X (uy, ¥) 


One can easily verify that the oriented angle obeys the usual elementary rules of 
calculation, namely, 


13,7 La, w) = (u,v) + £(¥, w) 


482 Quadratic and Hermitian forms Ch. 14, Sec. 13 


provided of course that angles differing by multiples of 2x are identified. (Clearly 
angles should be regarded as elements of the factor group R/P, where P is the sub- 
group of the additive group of R consisting of all integral multiples of 2.) 


Now let us consider a three-dimensional oriented euclidean vector space E over 
R, the orientation being denoted by k, as before. We are now going to define a very 
useful binary operation X in E. 


DEFINITION 13.2 If wand v are linearly independent vectors in E, then u x v denotes 
the vector defined by the conditions 


(uu xv) = (v,ux ¥) <0 
aus vx vl = [ul- |v sin? 

h({u, v,u x v}) = 1 
where (x, y) denotes the inner product in E and where 0 is the angle between u and v, 
withO <0 <n. If u,v are linearly dependent, then u x v is defined to be the zero 
vector in E' (see Fig. 2). 

The first condition says that u x v is orthogonal to both u and y. The set of all 
vectors x such that (u, x) = (v, x) = 0 is a one-dimensional subspace 1 of E (as- 
suming that u and vare linearly independent.) Let w be a unit vector in L, Then 
wspans L, and since u x v must bein L, we shall have u x v = ew, for some num- 
ber c, Then |u x vl = lew! = [cl [wl = |el, and soc = +|u|-|v| sin 6, by the 
second condition of (13.8). It remains only to fix the sign of ¢. The third condition 
of (18,8) requires that assign the value +1 to the base {u, v, cw}, and this deter- 
mines the sign of c. Hence (13.8) defines u x v uniquely. We observe that the 
second condition of (18.8) can be rewritten slightly as follows: 


sin? @ = |ul?|v2(1 — cos? @) 


Figure 2 |u| |u| sin 9 is the area of the parallelogram formed by wand v. The direction 
of ux v depends on the orientation selected for 3-space. Thus the outer product of 
two vectors is preserved by any unitary transformation T, provided that T preserves 
orientation. 


Vector products in oriented 3-space 488 
Using (13.4) we get 


13.9 fu x vf? = ful|v}? — (u, ¥)? 


We now show how to compute u x ¥: Let B = {e, ez, e:} be an orthonormal 
base in E, and put 


w= we; v= ve, 
For a vector x = z‘e, the condition (u, x) = (¥, x) = 0 becomes 
a3a00 wal + we? + we = 0 ola! + ote? + viet = 0 


since B is orthonormal. We can easily solve these equations as follows: 
We have 


jut ot 
13,11 lu? a? = O 
fu? a8 


sinee two columns are identical. Expanding by the first column we get 
wet + ee + we = 0 


where 


Replacing the first column by the »' gives similarly 
otf + of + rt = 0 


Furthermore, if u and v are linearly independent, then i!, #, # cannot all be zero. 
For the matrix consisting of the last two columns of (13.11) must have rank 2, and 
80 at least one of the 2 X 2 determinants (13.12) is nonzero. But in fact it is a 
straightforward matter to check that 


was 0) + (8+ = [uP lyle — Ge, ve 
Hence, by (13.9), the vector 


wet = fer 


484 Quadratic and Hermitian forms Ch. 14, See. 18 


satisfies the first two conditions of (13.8). Therefore u x v = tt. To determine 
the sign, let us now suppose that the orthonormal base B is such that h(B) = 1. 
Since h({u, v, ux vj) = 1 also, it follows from Definition 13.1 that the matrix 
from B to {u, v,u x ¥} must have positive determinant. Now the matrix a from 
B to (u, v, t} is simply 


‘alogl gh 
13.15 a=(wv ve P 
wo A 


To calculate its determinant expand by the third column, Taking account of 
(18.12) one obtains 


13.16 deta = (1) + (P)? + @)? = |t/? 
This is positive, of course, and it follows that we must take 
13a7 0 UK v= t= (we ~ wore, + (wie! ~ weer + (we? — ue Jey 
Hence we have achieved our aim of computing u x v, Observe that the quantities 
i}, ®, & are all zero if u and v are linearly dependent. Therefore the formula above 
gives the correct result in this case, too. 
It is plain from (13,17) that 
13.18 vx MU = -uUxX¥ uxon=0 


a result which also follows directly from Definition 13.2. 

The operation » is variously referred to as the outer product, or the cross product, 
or the vector product. It is anticommutative, as the preceding equation shows. 
From the formula (13.17) it is simple to verify that x is bilinear, Le., that it satis- 
fies the following distributive laws: 


(au + bw) ® v = a(u x v) + b(W x ¥) 


13.19 
u X (ev + dw) = ca x v) + d(u x w) 


However, x is nol an associative operation. 
Using (13.17) one can easily show that X satisfies the following modified associa- 
tive law, known as the Jacobi identity: + 


az ax (VX Ww) + WX (XV) +X (WRU) =O 


We recall from See. 6, Chap. 12, that |u| - |¥| sin @ is the area of the parallelogram 
spanned by u and v. From (13.8) it follows that the length of u x v is defined to 
he equal to that area. According to Sec. 6, Chap. 11, again, the volume of the 
parallelepiped spanned by u, vy, u x vy is equal to the determinant of (13.15), hence 
to |ul? |v[? sin? @, by (13.16). 

We mention the following formulas: 


1321 @) X @: = ey e Xe; = ~ey eX ey = ey 


{ After Karl G. J. Jacobi (1804-1851). 


Vector produets in oriented 3-space 485 


where | €:, e2, €,| is any orthonormal base for whichh = +1. These follow at once 
from Definition 13.2. Further, if u = w/e, v = v'e, w = we; then 


(xy, w) = Bal + Pw? + Bot 
where the é’ are as in (13.12). We have then 
lute wt | 


13.22 (u x ¥,w) = ju? ov? 
jw wl 


as follows at once by expanding by the third column. 


EXamPLe Consider Rs, with its standard inner product, its natural orientation, 
and its canonical base B: e, = (1, 0, 0), e = (©, 1, 0), es = (0, 0, 1). We recall 
that the natural orientation h is that for which A(B) = 1. Let w = (1, 2, —1), 
v = (8,0, 4). Then, by (13.17), 


ux vy = 8e — Te: — Ge; = 


Remark 1, E above with its vector-space operations and the cross product X 
is an example of a Lie algebra.t In general, a Lie algebra is a finite-dimensional 
vector space equipped with a product operation satisfying (13.18), (13.19), (18.20). 
Such systems are very important in the study of continuous groups. 


REMARK 2. An operation generalizing the cross product for spaces of dimension 
greater than 3 is studied in Chap. 16. 


EXERCISES 

1, Prove the formula (13.7) for oriented angles. 

2. Prove Eq. (28.13). 

3. Given linearly independent vectors u and v in an oriented three-dimensional 
euclidean vector space, describe 2 simple way of obtaining from them an ortho- 
normal base {e;, 2, es} such that e; = cu, where ¢ > 0. 

4, Prove the following formula for any vectors a, b, ¢ in an oriented euclidean 
3-space: 

ax (bx ©) = (c,a)-b — (ba) -¢ 

5. Calculate the area of the parallelogram P in R; spanned by the vectors 
u = (3,0, -1) and y = (1, 2, —2). 

6. Let [e1, ex, €s| be an orthonormal base in an oriented euclidean 3-space. 
Assuming that the base is positive, calculate 

(a) (e1 X e:)- es 
(b) (e1 X €2) X e@ 
(e) (€1 — €: + Bes) X (Ze, + e2) 

7. Find the volume of the tetrahedron in R; whose vertices are (0, 0, 0), 
(1, -1, 3), (2,4, 1), (3, 4, 5). 

j After Sophus Lie (1842-1899). 


486 Quadratic and Hermitian forms Ch. 14, Sec. 14 


8. Prove the identity 
(a X ¥), (WX Z)) = CU, WY, 2) — (U, 2)(¥, ) 
for the cross product. 
9. Prove that 


(u, a) (ul, ¥) 
= lu x vp? 


(vu) (¥, ¥) 
in an oriented euclidean 3-space. 

10, Using the cross product, prove the law of sines 
a b ¢ 


sina sin giny 
for a triangle in R;. 
U1. Prove the Jacobi identity (13.20). 


14. Analytic geometry in n dimensions 


In this section we generalize the considerations of Sec. 12, Chap. 8, to study certain 
geometric questions in n-dimensional affine space. Our aim here is merely to 
indicate how linear algebra can be applied to such problems. We refer the reader 
to Secs. 10 and 11, Chap. 8, for definitions of affine and euclidean space. Recall 
that one obtains a coordinate system inE, by selecting a point po ag the origin and 
a base e, @, ... , €, for the vector space of translations T(E,). Then for any 
point g the translation vector peg can be expressed uniquely as 


pay =e +. + 2e, 
We write q = (2',..., 2") with respect to po; en .-., x or simply g = 
(@', ... , 2") when there is no risk of misunderstanding. If the base is ortho- 


normal, we call the coordinate system euclidean. 

Let us first discuss k-dimensional affine subspaces T’ of Ey (see Sec. 10, Chap. 8 
for a slightly different point of view). Such a subspace I’ js, by definition, the 
set of all points obtained from a fixed point gp by 2 translation in a k-dimensional 


subspace T(P) of T(E,). Let uw, ... , a be a base for T(L), and suppose that 

u = Dae G=1%...,8 

tI 
Then, if q is any point of I’, gog € T(T) and so there exist scalars Xi, Ay...» » Ae 

such that 

ood = s dy = 3 Yo dae: 

ja fst 

Thus, if go = (xo, ... , %"),@ = (wl, ... , 2"), then 
ea x =n + @=1,2,...,2) 


rst 


Analytic geometry in x dimensions 487 


These are the parametric equations of T, the “parameters” being My, . . . , Aw 
and as they range over all possible elements of the field R, the point (z', .. . , x") 
ranges over all points of I’. 

To obtain another form of these equations, let us write them in matrix nota~ 
tion as 


14.2 X=AA 


where X = (2) — ay, ..., 2% — ay), A= (0h, . Me, A = (Gh) are X 9, 
1X k,and k X n matrices, respectively. Note that rank A = ksincem, -.. , Ur 
are linearly independent. Interpreting A as the matrix of a linear transformation 
F: V" — W* of an n-dimensional vector space V" to a k-dimensional vector space 
W (with respect to some bases), we see that the dimension of the kernel of F is 
n — k (Theorem 3.2, Chap. 9). If we denote this kernel by U*~* and pick a base 
in it, we obtain a matrix B representing the inclusion map G: U*-* 4 V*; clearly 
the composition FG is the zero map and so AB = 0. In other words, we have 
found an n X (n — k) matrix B of rank n — & such that AB = 0, and so multi- 
plying (14.2) on the right by B, we obtain 


XB =0 


This may be written as a system of n — k equations: 


14.3 > di(a* — a5") 


it 


-,n—k) 


or, finally, as 


mae So baat =e, Gs 


ici 


n—k) 


where ¢; = > bya’. Conversely such a system of equations represents a k-dimen- 
oi 


sional affine subspace, as follows from Theorem 8.5, Chap. 9. 
THeorem 14.1 If B = (b;;) is a matrix of rank x — k, the set of all points (2', 


. 2") in E, satisfying the system of equations 


. 
DY bet =e; (G12... n-® 


iat 


is a k-dimensional affine subspace of E,, and conversely, any k-dimensional subspace 
may be described in this fashion. 
In particular, if dim (T) = n — 1, then » — & = 1 and the single equation 


145 bal fees +b =e 


488 Quadratic and Hermitian forms Ch. 14, See, 14 


represents the hyperplane P. If po = (2, ... 2", Pi = (yl, . - , are two 
distinct points of I, then 8.2 = diy! = ¢, and so 

be@t yx) +--+ tbe — yy = 0 
Thus the vector b = (b;, . . . , 6.) is orthogonal to the translation vector pop, 


where po, p. are arbitrary points of [. This provides a geometric interpretation 
of the coefficients 6, . . . , by. 

As a sample of the type of question easily answered using the properties of 
(euclidean) affine spaces, let us ask, What is the shortest distance from a point ¢ 
(not on I’) to the hyperplane I? 

From what we have just seen, an orthonormal basis for T(#,,) may be obtained 
by adjoining to an orthonormal basis of T(T) the wnt vector b/|b|, which we de- 
note by n. Clearly 


BB wheree = VEE ba 


It is now a consequence of the discussion following the proof of Theorem 8.6 
that the required shortest distance is 


D = (0) 


where p is an arbitrary point of T' (see Fig. 3). 
Taking q = @',..-,99,p = (pl... , p"), a straightforward computation 
yields the following theorem. 


THEOREM 142 The shortest distance D from the point q = (qt, .- - . q") 10 the 
hyperplane byz' +--+ - + bax” = ¢ is given by 
big! 2 
ue De + + be 
So far we have studied affine subspaces which are described in terms of linear 
equations, Let us now consider quadrics, the sete of points (21, ... , 2") in E 


which satisfy a quadratic equation: 


nar Sl ayes + Saba te = 0 


Erm) I 


Figure 3 


Analytic geometry in n dimensions 489 


We may assume that a; = a, 1 < i,j < x, (why?) and so apply Corollary 2, 
Sec. 10, which tells us that there is an orthogonal change of basis in T(E,), trans- 
forming the euclidean coordinate system (x!, . . . , x") to another (y', . . . , 9”) 
and such that (14.7) in the new system reads as 


pay + > aa tr = 0 


fat 


By a permutation of the base elements, we may assume that p; ~ 0, i = 1, 


2...,k < mand p: = 0,k <i <n. We can eliminate the linear terms in 
yw, ..., yas in Sec. 12, Chap. 8, by a translation of the coordinate system 
given by 

gle yl t+ gi/2pi @=1,2,...,%) 

gay G>h 


and (14.8) becomes 


Fre) S pila)? + y qazi ts =0 


Sit) 


Tf all ¢; vanish, we obtain an equation of the form 


k 
24.10 > pty +s =0 


On the other hand, if not all g; vanish, we make a change of coordinate system: 


wi =x @=12...,0 
ye 
wa ALS oe 
where Q and w'#?, .. . , w" are so chosen that the transformation is orthogonal 


(see Exercise 8). A final translation enables us to eliminate the constant term and 
we obtain the equation 


k 
saan SD pie)? + Quit = 


tel 


Thus we have two possible types of equations: (14.10) and (14.11). Some of the 
possibilities for n = 3 are illustrated below. Note that the original polynomial of 
(14.7) was reduced to that of (14.10) and (14.11) by a succession of translations 
and orthogonal transformations of P(E,). The various transformations we have 
used act in T(E,)}, not in Z,,, 80 that the geometric figure represented by Eq. (14.7) 
is left unchanged but the equation describing it is successively simplified by chang- 
ing coordinate systems. Exactly the same manipulations can also be interpreted 


490 Quadratic and Hermitian forms Ch. 15, See. 14 


TA 
J 
! 
4 
‘ 
1 
} 
1 


Figure 4 (A) The sphere: (a)? + (x*)? + (x)? = 1. (B) Its image under Ty, Ti(', 
x, x!) = (x!, 2%, x). (C) Its image under T;, Ta(x!, x, x9) = (x4, 2? — 38, x4). 


from a different point of view, which leans on our geometric intuition. In this 
view, the coordinate system is fixed once and for all, and the transformations, 
acting now on &,, transform the points (z',..., 2”) satisfying (14.7) into 
@, ... , 2") satisfying (14.10) or (w!, .. . , w") satisfying (14.11). In other 
words the geometric figure represented by Eq. (14.7) is mapped onto one given by 
(14.10) or (14.11) (see Fig 4). The point in either approach is to study the origi- 
nal quadric by replacing it with one whose equation is more perspicuous. However, 
we must be careful that the geometric properties in which we are interested are 
preserved by the transformations employed. This leads us to a point of view first. 
clearly expressed by F. Klein in his celebrated and extremely influential “Erlangen 
program” (1872),f according to which geometry is the study of properties of 
geometric entities invariant under certain groups of transformations. Thus, 
+The points 1, 2,3 denote (0), e2(0), e:(0), respectively, where 0 is the origin and e,, e:, 


e; an orthonormal basis for T(E:). 
} English translation in Bulletin of New York Mathematical Society, vol. 2, 1892. 


Analytic geometry in x dimensions 491 


c 
YY 


Figure 5 
The hyperboloid of one sheet (x!)? + (x2)? — (x3)? = 1. 


k 
“i 
PARRY} 


different groups of transformations yield different geometries. It is not our in- 
tention here to develop these considerations at length; let us merely indicate their 
connection with the preceding discussion of affine subspaces and quadrics. 

Let E,, denote a euclidean vector space, which we regard as an affine space (cf. 
page 191, Example 1). The transformations we have been using are affine trans- 
formations of E,, namely, compositions of linear transformations and translations. 
If we consider only nonsingular linear mappings, the resulting set is a group (why?) 
called the affine group, An. If f € An, its action on E,, is given by the equation 


fx) = Ax+y¥ 
where A is a nonsingular linear mapping of E,, and vis a fixed (that is, independent 
of x) element of E,. Its inverse is clearly given by f(y) = A“(y — v), and its 


product with g, g(x) = Bx +w is given by (g-f)(x) = g(/(x)) = BA(x) + 
(Bv + w). The affine group A, preserves k-dimensional affine subspaces and 


Figure 6 


‘The hyperbolic paraboloid (x!)? — 
(a)? + x3 = 0. 


492 Quadratic and Hermitian forms Ch. 14, See. 14 


quadrics, in the sense that the image under any element of A,, of an affine subspace 
(a quadric) is an affine subspace of the same dimension (a quadric). An affine 
transformation f is called a rigid motion if it preserves lengths of vectors, that. is, 
if If] = |x| for every x e By; it is volume-preserving if it leaves the volume of all 
parallelotopes invariant (cf. Sec. 6, Chap. 11), It is easily verified that the trans- 
lations, the rigid motions, and the volume-preserving affine transformations all 
constitute subgroups of A,, the first one normal (what is quotient group?). 
We have proved above: 
Any quadric can be transformed by a rigid motion into a quadrie whose equa- 
tion has the form (14.10) or (14,11) 
We conclude this brief section by studying a certain quantity associated with 
every quadric and its invariance under affine transformations. 
Referring to equation (14.7), we set ¢ = @ru,aand by = ainsi 
G@=1,...,7) 
DEFINITION The discriminant of the quadric of (14.7) is the (n +1) x (2 +1) 
determinant |(a;,)|. It will he denoted by A(P) where P is the polynomial in (14.7). 


nate 


Thus the discriminant of the quadric } avez’ +1 = 0 equals the deter- 
minant |(@i)| GF =. am) ‘ 
THEOREM 143 Let T denole the affine transformation T(x) = Ax + v of E,, and 
TP the polynomial P(T(x)). Then 
A(TP) = det? (A) - A(P) 


Proof. Toa point x = (2, ... , 2”) of E let us associate the point 
x’ = (vt, 2.2. , 2", 1) of Ev+'. Introduce the polynomial P’(a',..., 
z+!) in x + 1 variables by the equation 
atl 
PUD, 2 = DS aiatth (ayy = a) 
fest . 


Note that this is a quadratic form on #,,,; and that it is uniquely deter- 
mined by the condition P’(x') = P(x). Associate, finally, to the affine 
transformation T on E, the linear transformation T’ on E,,4, as follows: if 
M andr denote the n X 2 matrix and the row vector representing A and v, 
T’ is represented by the (n +1) x (# +1) matrix 
w-() 
rol 

It is immediate from these definitions that T’(x’} = (T(x)}’. Hence, 
(TP)'(x') = TP(x) = P(T(x)) = P'[(T(x))'] = P’[T’(x’)], and the quad- 
ratic form (TP)' on E,,4, is obtained from P’ by the linear transformation 
T’. Denote by u(Q) the matrix of the quadratie form Q. Then, by (3.9), 


e(TP)') = 'M'e(P)M’ and 

A(TP) = det u((TP)’) = det? (M’) - det 2(P”) 

det? (M) - A(P) 

the identity det? (M’) = det? (M) being obtained by expanding in minors 
according to the last column of M’. 


Analytic geometry in x dimensions 493 


It follows from the preceding theorem that the sign of the discriminant of a 
quadric is preserved by A, and that the discriminant itself is preserved by the 
subgroup of volume-preserving affine transformations (ef. Exercise 9). 


EXERCISES 
1. Find the equations of the hyperplane in E, passing through the points 
g,0,...,0),0,10...,0),...,0,0...,01). 


2. Find the shortest distance from the origin to the hyperplane of Exercise 1. 

3. Can one generalize the discussion leading to Theorem 14.2 to obtain the 
shortest distance from a point in E,, to a k-dimensional affine subspace? 

4. Describe the surfaces in Z; whose equations are 
Gee +y 
x a z 
@g-p7aot @T+g7i¢7 71 

5. Under what conditions will surfaces in E; given by the equations below have 
the property that through every point of the surface there passes a straight line 
contained entirely in the surface? 

Ma? + ee + Ate = 2 
Mae + dete + waz = 0 
*6. Generalize Exercise 5 to E,. 

7. Prove that surfaces whose equation is given by (14.10) have “central sym- 
metry,” ie., if a point lies on the surface, so does its “reflection” through the origin. 
Conversely, show that if a quadrie surface possesses a center of symmetry as above, 
then its equation may be reduced to the form (14.10). 

8. If v is a nonzero row (or column) vector, then an orthogonal matrix can 
always be found containing a multiple of v as a row (or column). 

9. Find necessary and sufficient conditions in order that the affine transforma- 
tion f, f(x) = Ax + ¥, be volume-preserving. Show that these transformations 
form a group. Show that for any fe A,, the ratio volume (P),volume (f(P)) is 


constant for all parallelotopes P in E,. 
10. Let A be an affine space over an arbitrary field K. Let mi, ..., Da be 


points of A. 
(a) Prove ¥> As pop.lin) = > Asap: (qo) for any points po, go in A, provided 
7 


(a) y = xz 


that \i+ . - : +a, = 1 

(b) Define ip, +... + dap, wheredi +... +A, = 1 by the expression in 
(a). Define a mapping f:A > A to be affine if and only if f(Z\p,) = BAS(ps) 
whenever \y +... +A, = 1. Define an affine mapping to be linear with respect 
to po if f(po) = Do. 

(ec) Prove that an affine map f is a translation if and only if f(p) * p for all 
pin A, 

(d) Prove that an affine mapping is a composition of a linear mapping and a 
translation, 


15 


Quotient structures 


In this chapter we shall describe the fundamental notion of quotient set, and we 
shall illustrate its importance by numerous mathematical constructions. Several 
of the topics presented here have already been encountered in previous chapters, 
but they are treated in this chapter from a more advanced point of view. 


1. Mappings 


Let A and B be sets. We wish to give a definition of a mapping from A to B. In 
Chap. 1, we defined a mapping from A to B as a rule which assigns to each element 
of A an element of B. While this definition has served us up to this point, the 
authors must confess that the notion of a rule has been left vague—indeed, too 
vague for the standards of precision that we impose on ourselves. We wish there- 
fore to give a more precise definition of a mapping. 

Before presenting the definition, we call to the reader’s attention the role that 
mappings play in mathematics: simply to let us know which elements of A corre- 
spond to which elements of B. Put another way, a mapping serves only to provide 
the information for any ordered pair (x, y) with z in A and y in B: Is y assigned 
toa? 

In a pragmatic spirit, therefore, we bypass the difficult problem of defining a 
rule and we adopt the following definition of a mapping. 


perintion11 Let A and B be seis. A X B is the set of all ordered pairs (x, y) 
with x in A and y in B. 


DEFINITION 1.2 A mapping f from A to B is a subset of A X B satisfying the 
following condition: 

For each element z in A, there is one and only one element y in B such that 
(z, ») is in f; this element y is denoted by f(z). We say that ‘‘f assigns to x the 
element f(x)” or ‘‘f sends x into f(x). If f is a mapping of A to B, we sometimes 
write f: A B. 

Thus, to specify a mapping f is equivalent to specifying, for each x in A, the ele- 
ment f(x) in B. A mapping of A to B is sometimes called a function from A to B. 


Mappings 495 


exampLe1 Let Z be the system of integers, and let f be the subset of Z x Z con- 
sisting of all ordered pairs 


(x, 2x) 
with sin Z. This fis mapping. An equivalent description of this mapping f is: 


fig the mapping of Z to Z given by 
f(@) = 2x 


for all inZ 


exampte 2 Let K be a field, and let f be the subset of K x K consisting of all 
ordered pairs (2, #%), with xin K. Here f(e) = 2? for all # in K. 

It may be noted that what we are calling “the mapping f from A to B” is pre- 
cisely what some textbooks call “the graph of the funetion f.” Our definition of 
mapping ss inspired by the observation that the graph of a function f gives the 
value of f(z) for each # and therefore yields a perfectly adequate definition of 
“function,” 


Exampte 3 A binary operation on a set A is a mapping of A x Ainto A. In the 
usual multiplicative notation, the binary operation assigns to the ordered pair 
(a, b) the element a - b. 

Given a mapping f from a set A to a set B, we call A the domain of f and B the 
range of f. The subset of B consisting of all elements f(x) with x in A is called the 
image of f, or the image of A under f; it is denoted by f(A). If B = f(A), the 
mapping f is called surjective. If f(x) = f(x’) implies x = x’, the mapping f is 
called injective. A mapping / is called bijective if it is both injective and surjective. 
A bijective mapping is frequently called a one-to-one mapping. 

For any mapping f of A to B, one denotes by {—! the subset of all (y, x) in B x A 
such that (z, y) isin f. The subset f-' is not a mapping of B to A unless f is a 
bijective mapping. In case f is a bijective mapping, we call f—' the inverse of f. 
For any mapping f of A to B and for any y in B, we denote by f-'(y) the subset of 
all elements x in A such that f(z) = y; the subset f(y) is called the inverse 
image of y. 

In Example 2, the mapping f is injective but not surjective; the inverse image 
f-'(y) is y/2 if y is an even integer and is the empty subset of Z if y is odd. In 
Example 2, the mapping f is not injective if 1 = —1in K, since f(—x) = 2 = f(z) 
for all x, but —1 41. 

The mapping f of A to A such that f(z) = x for all x in A is called the identity 
mapping of A. Given a mapping f: A > B and a mapping g: B — C, we define 
the composition ge f: A — C as the mapping of A to C given by 


ge f(x) = g(fle)) 


496 Quotient structures Ch. 15, See. 2 
for allin A. If f is a bijective mapping of A to B, then clearly 


foe 
fof" 


identity mapping of A 
identity mapping of B 


If fis a mapping of a set A to a set B, and A’ is a subset of A, the mapping of A’ 
to B given by x — f(z) for x in A’ is called the restriction of f fo A’. 


EXERCISES 

1. Let A be a set. Let G denote the set of all bijective mappings of A to A. 
Prove that G becomes a group if composition is taken as the binary operation of G. 

2, Let A and B be sets, Let f: A > B and g: B — A be mappings. 

(a) If fog = identity mapping of B, f is surjective. 

(b) If gef = identity mapping of A, f is injective. 

(c) Let A’ be a subset of A, and let I denote the identity mapping of 4’. 
Is the composition f° J the same as the restriction of f to A’? 


2. Relations 


The considerations which led to the definition of mapping lead to the following 
definition of a relation. 


DEFINITION A relation on a set A is a subset of A X A. We say that the elements 
and y of A are in the relation R if and only if (x, y) ts in R. 
Notation, Let R bea relation on aset A, We write 2Ry if and only if 
(x, y) is in R. 
Thus a relation F is the totality of ordered pairs (x, y) such that zRy. 


exampee 1 Let Z be the system of integers. The relation < is the subset of all 
elements (x, y) in Z X Z such that y ~ z is positive. 

A relation R on a set A is called reflexive if zRz for allzin A. A relation R is 
called symmetric if yRz whenever Ry, for any z and yin A. A relation Ris called 
transitive if, for any x, y, z in the set A such that zRy and yRz, we have xRz. 

The relation < in Example 1 is transitive, but not reflexive and not symmetric. 


EXAMPLE 2 Let Z be the system of integers, and let < be the relation defined by 
z < yifandonlyife <yorz = y. Therelation < is reflexive and transitive, but 
not symmetric. 

A relation R on aset A is called an equivalence relation if and only if B is reflexive, 
symmetric, and transitive. 


Exampce3 Let A bea set, and let Rbetheset A x A. Then £ is the relation on 
A such that 2Ry for all z and y in A. Obviously F is reflexive, symmetric, and 
transitive. Thus # is an equivalence relation. 


Quotient set 497 


ExamPLe 4 Let A be a set, and let & be the subset of all elements (x, x) in A X A 
with zin A. Ris the relation such that xRy if and only if x = y. This relation is 
clearly reflexive, symmetric, and transitive. 


EXAMPLES Let J be the system of positive integers, or whole numbers as they are 
sometimes called. Let A = J x J, Let A be the relation on A given by 


(2, y) A (2’, y’) if and only ife + y’ = 2’ +y 
for any x, y, 2’, y’ in J. Then Ais an equivalence relation. 


Exampces Let Z be the system of integers. Let Z* denote the subset of all non- 
zero integers. Let A = Z Xx Z*. Let F be the relation on A given by 


(2, WF(a’, 9’) if and only if zy’ = x'y 


for any x, x’ in Z and y, y’ in Z*. The relation F is an equivalence relation. 


EXERCISES 
1, Prove that the relation A in Example 5 is an equivalence relation on J Xx J. 
2. Prove that the relation F in Example 6 is an equivalence relation on Z X Z*. 


8. Quotient set 


Let R be an equivalence relation on a set A. For any element x in A, we denote 
by <R the set of all elements y in A such that zRy. The various subsets 2¥ as x 
varies over A satisly the following two conditions: 


RU, x is in 2R for every x in A. 
R2. If xR and yR have a common element, then xR = yR. 


The first condition comes from the fact that an equivalence relation is reflexive, 
The second condition is proved as follows. Suppose xR and yf each contain the 
element z of A. Then xRz and yRz. By the symmetry of equivalence relations, 
2Ry. By transitivity, xRz and 2Ry imply xRy. Now if ¢ is in 7R, then we obtain 
in turn 

Rt 
thx 
tRy 
yRt 


and thus f is in yR. Therefore xF is a subset of yR. By the same argument, yR 
is a subset of eR. Hence xR = yR. 

The subset xf is called “the R-equivalence class of x.” Conditions Rl and R2 
assert in effect that each element of A lies in one and only one R-equivalence class. 
Thus the R-equivalence classes cover A and partition A into mutually disjoint 
subsets. 


498 Quotient structures Ch, 15, See. 3 


We now take a conceptual leap forward: we consider the collection of equivalence 
classes and denote this set (or collection) by A/R. That is, each element of the set 
A/R is a subset of A, namely, an R-equivalence class. The set A/R is called the 
quotient set of A mod R. 

The mapping 7: A — A/R which assigns to each « in A its R-equivalence class 
2R is called the projection of A onto A/R. Clearly #-'((z)) is the R-equiv- 
alence class containing x, for any z in A. 


eExamPLe1f Let G bea group, and let H be a subgroup. We define the relation R 
on G as follows: 


2Ry if and only if xy is in H. 
It is a simple exercise to prove that R is an equivalence relation. Moreover, the 

R-equivalence class containing an element « is the totality of elements y such that 
xy is in H. 

Multiplying on the left by =, we see that this condition is equivalent to 
y isin cH, 

where zH denotes the set of all elements zh with h in H. 
«R= 2H 

We call the subset zH the left H-coset of z. We denote by G/R the quotient set 


of G mod RB by the symbol G/H also, and we call it “G mod H.””_ In case the binary 
operation of G is denoted by +, we denote the left H-coset by s + H. 


Example 2 Let Z be the group of integers with addition as the binary operation. 
The even integers 2Z form a subgroup. The quotient set Z/2Z consists of only 
two cosets 0 + 2Z and 1 + 2Z, since for any integer x, either 2 or x — 1 is even 
(prove this by the division algorithm!) 


exampLe 3 Let Z be the group of integers with addition as the binary operation. 
Let » be a positive integer, and let nZ denote the multiples of x. The quotient 
space Z/nZ consists of exactly 7 elements 


nZ,b+nZ,2+2Z,...,@—-W+nz 


The proof of this assertion is left as an exercise. In Chap. 2, these cosets were 
called “residue classes modulo n,” and the set of integers 0,1,2,...,% ~ 1 were 
called ‘‘a complete set of residues mod x.” 


EXERCISES 


1, Prove that the relation of Example 1 is an equivalence relation. 
2, Prove that the quotient space Z/'nZ has exactly elements. 


{ See Sec. 10, Chap. 2, and Sec. 4, Chap, 10, for additional! examples. 


Binary operations on quotient sets 499 


3, Let G be a group having n elements, and let H be a subgroup having r ele- 
ments. 
(a) Prove that each right H coset has exactly r elements. 
(6) Prove that rin. 
(c) Prove that r-s = n, where s is the number of elements in G/H. 
4, In a group G having n elements, prove that x" = identity for any z in G. 


4. Binary operations on quotient sets 


Let G be a group, let H be a subgroup, and let + denote the projection x — 2H of 
G onto the quotient set G/H. Is il possible to define a binary operation on G/H 50 
that G/H becomes a group and becomes a homomorphism of groups? 

Suppose that indeed there is such a binary operation in G/H. Since a homo- 
morphism of groups sends the identity into the identity, (1) = 1’: where 1 and 1’ 
denote the identity elements of @ and G/H, respectively. Since +—'(m(1)) is the 
right H-coset containing 1, we have a—"(1’) = H. Now for any z in G, 


(tH!) = (2)r(H x(a)" = w(x) x(a) = 1 
Hence 
41 cH is a subset of H, for each x in G. 


Condition (4.1) is thus a necessary condition for the existence of a desired binary 
operation on G/H. 


DEFINITION 4.2 A subgroup H of a group G is called normal if and only if xhx— is in 
H for each x in G and h in H (this concept has been introduced in Chap. 10). 


rHeonEm 41 Let G be a group, let H be a normal subgroup of G, and let x be the 
projection of G onto G/H. There is one and only one binary operation on G/H with 
respect to which G/H is a group and 7 a homomorphism of groups. 


Proof, For any elements x and y in G, we consider the set of all ele 
ments in G of the form zHyH. Inasmuch as H is a normal subgroup, we 
have 


Hy = 9H 
for all yin G. Thus 

2HyH = xyHH = xyH 

That is, the product of two cosets is a coset. Accordingly, 
@: (2H, yH) > xyH 


is a mapping of G/H X G/H into G/H, and we take @ as the binary opera- 
tion in G/H. Obviously, for all x and y in G, we have 


500 Quotient structures Ch. 15, Sec. 4 
42 w(x) + w(y) = (zy) 


when we write the binary operation @ as multiplication. From Eq. (4.2) 
it may be verified directly that the binary operation of G/H satisfies the 
conditions imposed by the axioms of a group. Thus, G/H is a group and 
x is a homomorphism of groups. The uniqueness of the binary operation 
on G/H follows from the formula 


43 wee sae a) al)) — allueinG. 
which is a restatement of Eq. (4.2). 


DEFINITION 4.2 Lei G be a group and H a normal subgroup. The quotient group 
GH is the quotient set G:H with the binary operation taken as in Theorem 4.1. 


examece1 Let Z denote the group of integers with addition as the binary opera- 
tion. Since Z is abelian, every subgroup of Z is normal, Hence we may speak of 
the quotient group Z.nZ. It is a group with exactly n elements. 


examPLe2 Let G denote the permutation group of }1,2,...,}. Let H denote 
the subgroup of elements of G which keep » fixed. Then two elements x and y of 
G lie in the same coset if and only if 2—'y is in H, that is, 


yis in eH 
This condition is equivalent to 
y(n) = x(n) 


That is, the permutations x and y send x into the same integer. Since there are n 
possible integers into which n can be sent, we see that G/H has exactly elements. 
Hence by Exercise 3, Sec. 3, we see that 


(number of elements in G) = » x (number of elements in H) 


Clearly H is isomorphic to the permutation group of {1,...,2 — 1}. Letting 
«, denote the number of elements in the permutation group of {1,..., n}, we 
have 

OH On 


ane (n ~ Vows 


It should be noted that H is not a normal subgroup of G and that G. H cannot be 
made into a group in such a way that » is a homomorphism of groups. 

We now pose for rings the same question that was posed above for groups: 
Given a ring A, and an additive subgroup B (that is, a subgroup with respect to the 
binary operation of addition in A), is it possible fo define binary operations on the 


Binary operations on quotient sets 501 


quotient set A/B so that the projection +: A > A/B becomes a ring homomorphism? 
The problem of defining addition has already been solved in the affirmative. For 
with respect to addition, A is a commutative group and H is 2 normal subgroup, 
and addition can be defined as in Theorem 4.1. The definition of multiplication is 
possible only in a special circumstance. 


DEFINITION 43 Let A be a ring, and let B be an additive subgroup. B is called an 
ideal in A if and only if xb and bx are in B for all x in A and b in B. 


EXaMPLE3 Let Z be the ring of integers, and let x be any integer. Then xZ is an 
ideal in Z. 


EXAMPLE 4 Let A be any commutative ring, and let «be any element in A. Then 
xA is an ideal in A for any x in A; =A is called the principal ideal generated by 2. 


Examete 5 Let A be a ring, and let B be a subset of A. We consider the inter- 
section J of all the ideals of A which contain B. Then J is clearly an ideal. We 
call J the ideal generated by the subset B. One must not confuse the ideal generated 
by Band the subring of A generated by B. The latter is the intersection of all the 
subrings of A which contain B. 

Additive examples of ideals may be found in Exercise 3, Sec. 9, Chap. 2; Exercise 
2, Sec. 3, Chap. 6; and Exercise 7, Sec. 4. 


THEOREM 4.2 Let A be a ring, let B be an ideal, and lef 7 denote the projection of A 
onto A/B. There is one and onty one way of defining addition and multiplication in 
A/B g0 that r becomes a ring homomorphism. Conversely, if B is an additive subgroup 
of A such that binary operations can be defined in A/B so thal x is a ring homomor- 
phism, then the subgroup B is an ideal. 

The proof of Theorem 4.2 is similar to the proof of Theorem 4.1 and is left as 
an exercise for the reader. 


DEFINITION 4.4 Let A be a ring, and lel B be an ideal in A. The ring obtained by 
taking on A/B the binary operations described in Theorem 4.2 is called the quotient 
ring A mod B and is denoted by A/B. 


Exampte ¢ Let Z be the ring of integers, and let » be an element of Z. Then nZ 
is an ideal. Let Z, denote the quotient ring Z/nZ. Z, is a ring with n elements. 
The zero element of Z, is the coset nZ. The unit element is the coset 1 + nZ. An 
element ¢ + nB of Z, has a multiplicative inverse if and only if g.c.d.(r, 0) = 1, For 
if ged. (r, n) = 1, there are integers a and 6 such that ar + bn = 1. Applying 
the ring homomorphism +: Z — Z,, we have 


a(a)mir) = (1) 


Since r(r) = 7 + nZ, we see that r + 2Z has a multiplicative inverse. 


502 Quotient structures Ch. 15, See. 4 


Conversely, if r + xZ has a multiplicative inverse, there is a coset s + »Z such 
that 


a(s)r(r) = w(1) 
This implies that 
(sr — 1) = x(0) 
or 
sr ~ lis in r—(n(0)) = nZ 
thatis, there isazin Z such that 
sr b= nz 


From this equation it is immediate that any integer dividing both + and n also 
divides 1. Consequently, 


ged. (r,”) = 1 


In particular, if p is a prime integer, every nonzero element of the ring Z, has a 
multiplicative inverse. Hence Z, is a field, for every prime p. 


EXERCISES 

1. A subgroup H of a group G is normal if and only if each left H-coset is 2 
right H-coset, that is, rH = Hz for all x in G. 

2. Deduce from Eq, (4.2) that the binary operation of G/H is associative. 

3. Deduce from Eq. (4.2) that the equation az = 6 has a unique solution x for 
each a and b in G/H. 

4. Let G be a group of permutations of a set X, let x be an element of the set X, 
and let G, denote the set of all permutations in G such that g(z) = z. 

(a) Prove that G, is a subgroup of G. It is called the stabilizer of «. 

(b) Let G(x) denote the set of all elements in X of the form g(x) with ginG. The 
subset G(z) is called the orbit of 2. Prove that there is a bijective mapping of the 
quotient space G/G, onto the orbit G(z). [Hint: Map the coset 9G. into the ele 
ment g(z).] 

5. Prove Theorem 4.2. 

6. Let K be a field, and let x be an indeterminate over the field K. Let A 
denote the ring of polynomials K[z]. Let p be a prime element (i.e., irreducible 
polynomial) in A, and let B denote the principal ideal pA. Prove that the quotient 
ring A/B is a field. . 

7. Let A be a commutative ring. An ideal P in A with P = A is called 2 
prime ideal if and only if for any x and y in A whose product cy is in P, either x 
isin P ory isin P. Prove: Let A be a commutative ring and B an ideal in A. 
Then A/B is an integral domain if and only if B is a prime ideal in A. 


Construction of the field of quotients 503 


8. Let A be a commutative ring, and let B be an ideal in A, with B « A. The 
quotient A/B is a field if and only if B is a maximal ideal, that is, if B is not con- 
tained in a larger ideal, other than A itself. 

9. Prove that every prime ideal in the ring of integers is maximal. 

10. Let K be a field, and let x be an indeterminate over K. Prove that every 
prime ideal in K[z} is maximal. 

11. Let V be a vector space over a field K, let U be a subspace of V, and let x 
denote the projection of V onto V/U. Prove that operations ean be so defined on 
V/U that r becomes a linear mapping. 

12. Let V and U be as in Exercise 11. Prove dim U + dim V/U = dim V. 

13. Let T be a linear mapping of V to V such that T carries U into U. Prove 
that. 

(a) The rule 

z+UoTz+U 
defines a mapping of V/U into V/U whieh is linear. This mapping is called the 
V/U part of T and is denoted by Tri. 

(b) det T = det Ty - det Ty,y, where Tc: U — U is the U part of T. 

14, Let G be a group, and let f be a homomorphism of G into a group G’. Let 
H be a normal subgroup of G lying in the kernel of f. Prove that there is one and 
only one homomorphism f: G/H — G’ such that 

for =f 
where = is the projection of G onto the quotient group G/H. 

15. State and prove the analogue of Exercise 14 for rings. 

16, Prove: The only ideals in a field K are (0) and K. 

17. Prove: A homomorphism of a field is either a monomorphism or maps the 
field into zero. 

18. Prove: The ideal generated by a nonzero element in the ring A of n x 2 
matrices with coefficients in a field F is the entire ring A. 


5. The construction of the field of quotients 
of an integral domain 


Let D be an integral domain (cf. Chap. 2). We wish to construct a field K satisfy- 
ing the following conditions: 

(1) K contains a subring D’ isomorphic to D. 

(2) Each element of K is a quotient of elements of D’. 

Our construction is as follows. Let D* be the set of nonzero elements in D. 
Let A = D x D*. Let F be the relation on A given by (2, y) F(z’, y’) if and only 
if zy’ = 2’y for any (x, y) and (2’,y’) in A. By Exercise 2, Sec. 2, the relation F 
is an equivalence relation in D x D*. (F is modeled after the equality relation 
between fractions.) Set K = A/F. We shall now define addition and multiplica- 
tion in K. 


504 Quotient structures Ch. 15, Sec. 6 
Define for any elements (2, y) and (2’, y’) in A 
(eu) + 9) = fey’ + ay, wy’) 
(wv), y’) = (a2, wy’) 


It is readily verified that for any elements a and b in A, x“ (m(a) + m(6)) are each 
F-equivalence classes, where 7 denotes the projection of A onto A/F. Thereby, 
one can define addition and multiplication on K by the formula 


a + G2 = a(r (qi) + 7 7(@)) 
Qe = wl (q) - e142) 


for any q, and ge in K. One can check, axiom by axiom, that K is a field. The 
mapping 


giz az 1) 
is a monomorphism of the integral domain D into K. Set D’ = y(D). Since 


(a, b) = {a, 1)- (1, 8) 


and 
(CL, 8)) + ((B, 1)) = w((b, 8) 
= the unity in K 
we have 
a((L, 6)) = (alto, 1p 
and 


(a,b) = x(a, 1) > (af(6, IY) 
that is, each element of K is a quotient of elements in D’. 


EXERCISE 
Check, axiom by axiom, that K is a field. 


6. The construction of the field of real numbers 
from the field of rational numbers 


Let 4 be the set of positive integers, Let A bea set. By a sequence of elements 
of A we mean a function from J to A. If f is a sequence of elements of A we call 
(1) the first element of the sequence, f(2) the second element of the sequence, 
(3) the third element of the sequence, and f(n) the nth element of the sequence for 
any positive integer ». It is customary to write f, for f(n) and to write (a1, a, 
as... 4 Gay...) for the sequence f with f(1) = a, f(2) = a, f(8) =a, and 
more generally fon) = a, for each positive integer x. 


Construction of the field of real numbers 505 


EXAMPLE 1 The sequence of digits in the decimal expansion .333 - - « for 14 is 
the constant function f(x) = 3 for all» in J. 


EXAMPLE 2 The sequence of rational numbers 


is the function f:J — @ such that 


1 


fn) = gra 


examPLe 3 The sequence of rational numbers 


A sequence of rational numbers (@;, a2, ..., @e,»..) is called a Cauchy 
sequence if and only if: 
Given any positive rational number e, there is an integer p such that 


lan = Gm] <€ 


for all m,n > p. 

We denote by @! the set of all sequences of rational numbers. In @Q! we define 
addition and multiplication by the usual rule for the addition and multiplication 
of funetions: 


(F + (ny = for) + a(n) 
(f+ 9)(n) = fn) - (ny 


for all nin J. The set Q’ is clearly a commutative ring with these operations. 
Let A denote the subset of Cauchy sequences in Q!, It is easily checked that 
A is a subring in Q4, 
Let N denote the subset of A consisting of Cauchy sequences (a, a... , 
a, .. -) with lima, = 0. It is easily checked that N is an ideal in A. Set 


R=A/N 


Let # denote the projection of A onto R. One can easily verify the following: 
(1) Any nonzero element in R has a multiplicative inverse. Thus the com- 
mutative ring R is a field. 


506 


Quotient structures Ch, 15, See. 6 


(2) For any rational number gq, let q* denote the constant sequence whose uth 
term is q for each positive integer n. Let y denote the mapping 


qq") 


of Q into R. Then ¢ is a field monomorphism of Q into R. We set 
Q’ = o(Q). 

(3) Let R* denote the subset of R defined as follows: 
m(f) belongs to Rt if and only if there is a positive rational number r 
and a positive integer p such that 


f@2r 


for all x > p. 

It is not difficult to prove that 

(i) If rand s are in R+, then r + s and r-s are in Rt. 

(ii) If r is in R+, then ~r is not in rt. 

(iii) If r is not the zero element of the field R, then either r or —r is in Rt. 

Upon declaring R+ to be the set of positive elements in R, we make R into an 
ordered field. It is clear that 

(iv) Given any element ¢ in R, there is an element ¢ in the subfield Q’ such 
that ¢ >. 

The main purpose of our construction of R can now be stated at last. The 
ordered field R has the property 

(v) R is a complete ordered field. 


Proof. Let r, 12, ... Tm - + - be a Cauchy sequence of elements in 
R. Each element r; is by definition r(a;), where a; is an element of A; 
that is, a; is a Cauchy sequence of rational numbers 


ids Gia ee Mike ee 


For each ¢, there is an integer p; such that 


1 
|@im — Gal < R for m,n > p; 


Now consider the sequence of rational numbers 


by be be ee 
with 
ba = Oni Px 
We show that di, b:, .. ., ba, .. . i8.a Cauchy sequence in Q. For 


given any positive rational ¢, we select the integer p so that 


Irn — tl <§ lor m,n > p 


Construction of the field of real numbers 507 


€ 


Next select the integer ¢ so that x < Then we have 


3 
bn — Bm = l@nsmy = Omipml = [Gran ~ Gn,i) + (dag — Ont) 
+ (Gin,e > Om. p | 
S [any — Qnil + lene — Goal + [Ome — Gm.ml 


for any subscript 7. Now by definition of the order in R, we deduce from 


rm — tal < 3 that 


Gx,¢ — j<t 
ms — Amid <3 


for all i > g, for some integer g. Hence for m, n, 1 > p +4 we have 


1 e 1 € € € 
be — bal <3 +B HB <a tg +g 


That is, 


bn — Bnl <e 


for all m,n > p+. Therefore b, be, bs... is a Cauchy sequence. 
Let r denote the clement of R that is represented by the sequence 
bi, by... Om... . It follows directly from definitions that lim r; = r. 


Therefore R is a complete ordered field. 


It follows now from Exercise 4, Sec. 3, Chap. 4, that R satisfies the axioms for 
the field of real numbers. 

Note. The constructions of Secs. 5 and 6 reveal that the system of real 
numbers can be constructed from the system of integers. Consequently, 
the assumptions made about the system of real numbers cannot lead to an 
inconsistency or contradiction unless the assumptions about the system 
of integers likewise lead to a contradiction—for any assertion about real 
numbers ultimately can be reformulated as a statement about integers. 

It is for this reason that the mathematician takes the trouble to con- 
struct the real-number system. 

The question of whether the assumptions imposed on the system of 
integers can lead to a contradiction has received intensive study by mathe- 
maticians and logicians. By far the most important investigation of this 
question was made by Kurt Gédel, in a celebrated memoir published in 
1930 in the Monatshefte fiir Mathematik und Physik. 

Suffice it to say that the question of the consistency of the assumptions 
concerning the integers leads to the question of the consistency of logic 
itself, since the system of integers can be constructed from the basic terms 
of logic alone. Such a construction is given for example in Russell and 
Whitehead’s ‘‘Principia Mathematica.” 


508 Quotient structures Ch. 15, See. 7 


The construction of the integers from the terms of logic comes in three 
stages. First one defines the system J of whole numbers. Then one de- 
fines in J the binary operations of addition and multiplication. Finally 
one extends the system J to the system of integers (negative as well as 
positive). Indeed, one can obtain Z as (J x J)/A, where A is the equiva- 
lence relation defined in Example 5, Sec. 2. 


EXERCISES 
1. Prove, in detail, assertions (1), (2), and (3) above. 
2. Prove (i), (ii), (iii), and (iv) above. 
3. Exhibit the bijective mapping of Z onto (J x J)/A, and derive the binary 
operations of Z from the corresponding operations of J. 


7. The construction of a field containing 
a root of a polynomial 


Let K be a field, let x be an indeterminate over K, and let K[x] denote the ring of 
polynomials in x with coefficients in K. By definition, for any elements ¢, ¢1, 

. ein K, the polynomial ¢, + err + ex? + +++ + e,2" = 0 (in K[x) if and 
only if =e =@ = --- =e, =0. If Lisa field containing K, and } is an 
element of L, then the mapping 6: K(x] + L given by 


Oley far ties teh ey toh +--+ - +e," 


is a ring homomorphism of K[:] into L called the “‘b-substitution homomorphism.” 
An element 6 in L is called a zero or roof of the polynomial f(x) =e + ab + 

+ +e," = Qin L, that is, &(f(x)) = 0. We know that an element } in K isa 
zero of a polynomial f(x) with coefficients in K if and only if z — 6 divides f(x) in 
the ring K(x]; for upon applying the division algorithm, we obtain 


fe) = @ — biq(e) +r 
with r a constant, and the constant r = 0 since 
0 = f@) = 0-ab +r 


Henee, if f(x) is an irreducible polynomial of degree greater than 1, f(x) has no 
roots in K, and f(b) # 0 in K for all b in K. 

We pose the problem: Given an irreducible polynomial f(z) in K[x], construct a 
field L containing K such that for some b in L, 


aA f(b) =0 


The solution of this problem is quite simple. Let B denote the ideal f(z) - K(xl 
of the ring K(2] consisting of all the polynomials in K{z] that are divisible by the 
irreducible polynomial f(z). From the unique factorization theorem# in K{r|, one 
+ Theorem 3.6, Chap. 6. 


Construction of a field containing a root of a polynomial 509 


sees at once that if g(x) and h(z) are in K[x] and the product g(x) - h(z) isin B, then 
either g(x) is in B or h(x) is in B; that is, B is a prime ideal (cf., Exercise 7, Sec. 4). 
It follows by Exercises 10 and 8, Sec. 4, that the quotient ring K[z]/B is a field. 
Let denote the projection of K{z] onto K(x]/B. Then x yields a ring homomor- 
phism of the subfield K in K[z]. Hence, by Exercise 17, See. 4, maps K mono- 
morphically or r(K) = (0). The latter possibility is excluded since 1 is not in the 
kernel of x. Hence, the field K[z|/B contains a subfield 7(K) isomorphic to K. 
Set L’ = K[x|/B, K’ = r(K), and c’ = n(c) for any cin K. Set 6 = x(z). Then 
we have L’ = #(K[z]), and L’ consists of all expressions of the form 


aj + aib + as +--+ +a;d" 
with aj, ai, . . . , a4 arbitrary elements of K’. Since B is the kernel of the homo- 
morphism 7, we have 
0 = a(fle)) = rly bee tem? +--+ beget) 
= 0h Heie(t) + + + + che (a") 
=e teh t+. ++ fete 
=f) 
that is, 
m2 f(b) = 0 
where 
F(z) = ey Fox + ct? foo + 4 na" 
and 
f(a") = eb + cic’ + ele? +--+ + chx™ 


It is clear that any polynomial f(x) in K[e] is irreducible in K(x] if and only if the 
corresponding polynomial /"(x’) in K’[r'] is irreducible in K’(z'], where 2 is an 
indeterminate over K and 2’ an indeterminate over K’. Thus Eq. (7.2) reveals 
that our problem is solved for the field K’, which is isomorphic to K. 

In order to solve the problem for the original field K, one can resort to a standard 
trick of logic. Let L be any set containing K which can be put in a one-to-one 
correspondence g with L’ in such a way that each element c of K corresponds with 
r(c) in K’; that is, g: L — L’ isa bijective mapping and y(e) = w(e) forall ein K. 
(The existence of such a set L, while intuitively obvious, can be proved only after 
digging into the foundations of logic). Define the binary operations in L to corre- 
spond to those in L; that is, define 


ute = ool) + 9) 
wow = oH} + vle)) 
Then L is the desired field.+ 


+ The reader may find it profitable to consult Exercise 9, See, 4, Chap, 6, for a slightly 
different description of L. 


510 Quotient structures Ch. 15, See. 9 


By repeating the construction above, one can find a field L which contains all 
the roots of any given polynomial f(z) in K[x]. By repeating the construction 
infinitely many times, one can find a field K which contains the roots of all the 
polynomials in K[z], the field obtained in this way is algebraically closed; that is, 
any polynomial K[z] has a root in K. 


8. A paradox to avoid 


Given sets A and B, we employ the notation A ~ B to mean that there is a bijec- 
tive mapping from A to B. Clearly 


A~A, 
aa If A ~B, then B~ A. 
IfA ~ Band B~C, then A~C, 


The assertions (8.1) seem to say that ~ is an equivalence relation. However, 
we must be very careful to specify on which set ~ isa relation. One is tempted to 
say: ~ is a relation on the set X of ail sets. Our reason for not treating ~ as a 
relation on this X is that it is advisable to avoid all mention of X. The reason for 
this self-imposed taboo 

The usual rules of logic lead to a self-contradiction if one applies them to X, 

We give a celebrated example due to Bertrand Russell. Let Y be the set of all 
elements of X satisfying the condition: 


y is not an element of y. 


If Y is not an element of ¥,\then Y is an element of Y by definition of Y—a 
contradiction. If Y is an element of Y, then Y is not an element of Y—again a 
contradiction. Thus, in any case, we are stuck with a contradiction. 

This logical paradox can be avoided by adopting additional rules of logic which 
would make the foregoing paradox impossible to formulate. 

The rule that is most often adopted by mathematicians is to declare that 
the totality of sets is nof a set. Even more, one never uses “all the sets such 
that... ,” but rather ‘‘all the subsets of the se¢ A such that . . . .”’ With such 
conventions, the Russell paradox cannot be formulated, and we escape, as far as 
we know at present, the spectre of contradictions in logic. 


9. Bernstein's theorem on cardinal numbers 


Two sets A and B are said to have the same cardinal number if and only if A ~ B, 
that is, there is a bijective mapping from A to B. 

Let J denote the set of positive integers and let J, then denote the subset of 
elements z in J with z <x. We say that the cardinal number of a set A is x if 
and only if A ~ J,. One can prove (by induction) that if J, ~J,, then m = n. 


Bernstein's theorem on cardinal numbers 511 


‘Thus if a set has the same cardinal number as some J,,, then the cardinal number 
of A is a unique integer. A set is called finite if A ~ J,, for some n, A set is 
called infinite if it is not finite. It can be proved (ef., Exercise 8 below) that a 
set A is infinite if and only if there is a subset A’ in A with A’ # A and A'~ A. 
For example, the set of integers Z and the set of even integers 2Z have the same 
cardinal number, since the mapping z — 2z is a bijective mapping of Z to 2Z. 

One of the most important observations about cardinal numbers is given by the 
following theorem of F. Bernstein. 


THEorEm 3.1 Leff: A —» Band g: B A be injective mappings. Then there is a 
bijective mapping ¢: A — B. 


Proof. Set 


The mappings s and ¢ are injective and fos = fo(gof) = (fog)ef = 
tof, that is 


94 feos =tof 
For each non-negative integer », we denote by s" the composition 
sos ---estaken x times, s! denoting the identity mapping of A. 
We define the relation $ on A by 
xSy if y = s"(x) or x” = s*(y) for some n 
The relation S is an equivalence relation on A. (Prove this!) Similarly, 
define the equivalence relation T on the set B by 
aTy if y = O(@) ore = PG) 


for some non-negative integer n. 
We observe that there are two kinds of S-equivalence classes: 
(i) One in which each element z can be expressed as s(y). 
(ii) One which is not of type (i). 
An S-equivalence class of type (i) which contains an element z consists 
of the elements 


oe STS HE), SHH), Hy Crs TI, BR) 

An equivalence class of type (ji) must have an element x such that 
% ~ s(y) for any y in A; it is clear, therefore, that the infinite sequence 
Xo, S(%y), Sau), « - » 


makes up such an S-equivalence class. We call 2 the generator of this 
equivalence class. Clearly an equivalence class of type (ji) has a unique 
generator. 


512 


2.3 


oa 


Quotient structures Ch. 15, See. 9 


By virtue of (9.1), one sees that the mapping f maps any S-equivalence 
class of type (i) into a T-equivalence class of type (i). (Verify this!) 
Therefore f maps each S-equivalence class of type (ii) into a T-equivalence 
class of type (ii). 

Let 2S be an S-equivalence class of type (ii) with generator x. The 
image f(%S) is contained in a T-equivalence class of type (ii), and this 
latter class therefore has a generator %. We assert: Either 


FG) = Ww or gud = 


For if f(a) # yu, we know that /(x,) is not a generator of its T-equiva- 
lence class, and therefore 


f(a) = Ua) (nm > 0) 

Hence 

f(a) = (Fog) © (f° a)" "an) = GCF eg)" Wo) 
Since f is injective, 


By = 9° fog} (40) 
(ge fy" © g(yo) 


since 


ge(foge +++ o(fog) = goflo(gefio+++ og 
(ofr eg 


ge(fegy 


Thus x = 1, f(t) = f° g(yo), and 
Xo = 9(Yo) 


As a consequence of assertion (9.2), we can find a bijective mapping 
between any S-equivalence class of type (il) and its corresponding T-equiv- 
alence class, Namely, let 5 be an equivalence class of type (ji) with 
generator 7, and let yo be the generator of the equivalence class containing 
f(xS). The mapping 


8"(Xy) > UCyo) (fw =012,...) 


is a bijective mapping of the equivalence classes. 
On the other hand, if x belongs to an S-equivalence class of type (i), 
the mapping 


x f(z) 


is a bijective mapping of the S-equivalence class xS to the T-equivalence 
class f(x)T. 
Let ¢ be the mapping of A to B that is defined by (9.3) and (9.4) for 


Bernstein’s theorem on cardinal numbers 518 


elements in equivalence classes of type (ii) and type (i), respectively. 
Then ¢ is the desired bijective mapping. 


EXERCISES 

Let J, Z, Q, and R be as in Sec. 6. 

1. Prove J ~ Z. 

2. Prove I~ Q. 

3. Let A bea set, let P(A} denote the set of all subsets of A, and let 2' denote 
the set of all functions from A to the set of two elements }0, 1]. Prove P(A) ~ 
24, (Hint: Consider the mapping f — f-'(0).) 

4. For any set A, prove that 4 ~ 2" is impossible. [Hint: Given a mapping 
@: A +2", let g be an element of 2" such that g(a) # 8(a)(a) for each a in A. 
Show that g is not in the image of #.] 

5. Prove that the relation S defined in Sec. 9 is an equivalence relation. 

6. Prove R ~ 24. 

7. Deduce from Exercises 4, 5, and 6 that J ~ R is false (ef. corollary to Theo- 
rem 4.2, Chap. 4). 

8. A set A is infinite if and only if A ~ A’ for some proper subset A’ of A with 
A’ #A, 

9. Let A be a set, and let & be the relation on P(A): BRC if and only if B ~ 
Prove that P(A}/ has a relation < such that: 

(a) w(b) < w(e) if Bis a subset of C, where 7: P(A) — P(A). R is the projection 
associated with FR. 

(b) For any elements a and ) in P(A)-R, either a < bord < a. 

(c) Ifa < band b <a, thena = 4. 

10. Does each non-empty subset of P(A) R have a least element? 


16 


Tensors 


1. Introduction 


This chapter is concerned with some important systems associated with a vector 
space. First we define the tensor product U @ V of two vector spaces over a 
field K. Roughly speaking, U @ V is the nearest we can come to making the 
set U x V of all pairs (u, 2) into a vector space satisfying 


(a + ta, 2) = Qh, 2) + ye) and (4, + m2) = (H, 1) + (4 &) 


With this operation we then form spaces U,’ = U @U ® -- - @ U (ptimes). 
Roughly speaking, again, U,” is the smallest vector space which contains all p- 
tuples of vectors in U. Using the dual space U* of U we also form spaces U,! = 
U* ® U* @ -- + @ U* (q times). The elements of U,° can be considered as q- 
linear functions on U; the elements of U,” ean be considered as p-linear functions 
on U*. In this way the study of multilinear operations can be reduced to the 
study of linear operations. Further, we form the spaces U,” = U,” @ U,', and 
finally we put these all together into a single gigantic system called the lensor 
algebra of U. It can be thought of as consisting of all multilinear operators on 
U and U*. 

Each nonsingular linear transformation of U induces a transformation on each 
U,’, and the resulting transformations characterize the fype of the tensor spaces. 
From the tensor algebra of U we shall derive another system called the exterior 
algebra of U. It has important connections with subspaces of U. 


2. Tensor products 


In this section we define an important operation on vector spaces, We first state 
two preliminary definitions which will be used frequently in this chapter. 


DEFINITION 2.1 Lef 5, ..., S, be arbitrary sets. Then 8S; XS: X +++ XS, 
will denote the set of all ordered r-tuples (81, 8, . . . , 8), with 8; in S, fori = 1, 
wate 
The set thus defined is called the cartesian product of Si... Sp 


Tensor products 515 


DEFINITION 2.2 Lei U,, ... , U,, W be vector spaces over a field K. A mapping 
f:U, xX +++ X U, > W is called r-lineart if f(x, %, . . . ,x,) is linear in each of 
the + entries, that is, if 

(RR HP Ke PHI 
a4 fi... ex, Serf tye) 


for any x, x, in U; and any ¢ in K, the entries in the places 1,...,¢—1,i +1, 
. 7 being arbitrary (but the same on both sides of (2.1), naturally). 

For r = 1 this is simply the definition of a linear mapping U, > W. For r = 2 
such a mapping f is called bilinear. For example, a bilinear function on a vector 
space U, as defined in Chap. 14, is a bilinear mapping U x U — K, as defined 
above. All -linear mappings U, x - - - x U, > W forma vector space over K, 
with the usual definitions of addition and scalar product of mappings. 


Now let U and V be finite-dimensional vector spaces over the same field of 
sealars K and choose bases {u;,.. . , Un} in U and {vi,..., va} in V, so that 
we assume dim U = m and dim V = 2. Let P be a vector space of dimension 
m-n over K. Since there are m-n index pairs (i, j) with i=1,..., mand 
j=1,..., 7, we ean use these pairs to index the elements of a base {pi;} in P. 
That being done, we define a mapping f: U x V — P as follows: 


22 f(x, y) = 2ylpy where x = x'u, and y = y'v; 


It is easily verified that f is bilinear, and (2.2) gives f(u:, ¥;) = py. The image 
of f, that is, the set f(U X V) in P, contains the base {p,;} and therefore spans P. 
Furthermore, f has the following important property: Let g: U x V — L be any 
bilinear mapping into a vector space L. Define a Hnear mapping gi: P ~ L by 
the rule 


28 ape) = e(Ws, vy) 
This defines g, on the base {p;;] in P and therefore determines g, uniquely by 
linearity. For this mapping g, we have g = gio f. For 
(ge HC, y) = gfx, ¥)) 
ate'v'py) = w'y'e(py) = x'v'e(us vp 
using (2.2), (2.3) and the linearity of g,. On the other hand 
8% y) = givin, yy) = z'y’- guy, ¥,) 


from the bilinearity of g {that is, from Eq, (2.1)], Hence we have shown that any 
bilinear mapping g of U x V can be “factored” into the composition of a linear 
mapping g; of P and of the fixed bilinear mapping f. 

The space P with the mapping f is called a lensor product of U and V. The 
description we have just given of it suffers from the disagreeable drawback of being 
Or it is called simply multitinear if there is no need to specify r. 


516 Tensors Ch. 16, See. 2 


tied to bases in the vector spaces. We can overcome this by simply taking the 
properties of f cited above as the basis for an axiomatic definition, which we now 
state. 


DEFINITION 2.3 By a tensor product of two vector spaces LU, V over a field K is meant 
a vector space P over K equipped with a fixed bilinear mapping fi U X V = P having 
the following properties: The image f(U x V) spans P; and if g: U X V ~ L is any 
bilinear mapping into a vector space 1, then there exists a linear mapping gi: P > L 
such thal g = gre f. 

The existence of a tensor product of two finite-dimensional vector spaces is 
assured hy the discussion above. We observe that Definition 2.3 makes sense for 
infinite-dimensional spaces, too, and the existence of tensor products in general can 
be demonstrated by a similar argument, We shall be concerned almost exclusively 
with finite-dimensional spaces in this chapter, and we shall therefore not go into 
the infinite-dimensional case. 


REMARK 1. The tensor-product operation is of great importance in many parts 
of mathematics. With minor modifications Definition 2.8 can be applied to 
other kinds of algebraic structures. For example, if / and V denote any additive 
abelian groups, and if we replace K above by the ring of integers Z, then Defini- 
tion 2.3 becomes the definition of the tensor product of two abelian groups (for 
this modification the terms linear and bilinear are still to be taken in the sense 
of Definition 2.2, with ‘‘vector space” replaced by ‘additive abelian group” and 
with K replaced by Z). 


We start with a very simple proposition which is of use in analyzing Definition 
2.3. 


PROPOSITION 2.1 Let U7 be a vector space over a field K, and let S be a subset which 
spans U. If hy and h, are linear mappings of U to another veetor space, and if hy and 
hy have the same effeet on elements of S, then hy = hy. 

Proof. By assumption (cf. Exercise 5, Sec. 5, Chap. 8) every vector x 
in U can be expressed as a linear combination x = es, + ++ + + 6,5, of 
elements in S with coefficientse, in K. Since hy is linear, hy(x) = eyhy(s;) + 

+ +e-h(s,), and similarly for By assumption, hy(s,} = he(s;) for 
1,..., 7, whence hy(x) = In(x). Q.E.D. 


PROPOSITION 2.2 The linear mapping gi: P — L of Definition 2.3 is uniquely deter- 
mined by the bilinear mapping g. 

Proof. We have g = g.ef. Let g, be another linear mapping P — L 
such that g = g:° f, so that gio f f. Then g, and g. have the same 
effect on all elements in the image f(U x V) in P. By assumption, this 
image spans P, and therefore g, = g., by Proposition 2.1. Q.&.D. 


Tensor products 517 


According to Definition 2.8, a tensor product of U and V consists of a vector 
space P and a certain bilinear mapping f, We shall sometimes indicate a tensor 
product by the notation {P, f}. 


THEOREM 2.3 Let {P,f| and {P’, f'| be two tensor products of vector spaces U and V. 

Then there is one and only one linear mapping h: P — P’ such that f’ = ho f, and 
h is un isomorphism. 

Proof. By Definition 2.3, f and f' are bilinear mappings of U Xx V into 

Pand P', respectively. Taking gin Definition 2.3 to be f’, we see from the 

definition that there is a linear mapping h: P — P’ such that f’ = he f; 

and h is unique, by the preceding theorem. Similarly, there is a unique 

linear mapping h’: P’ > P such that f = h’ef’. Hence, f = h’o (he f} = 

(h’ch) ef, and therefore, for any element t in the image f(U x V) of f, 

we must have t = (h’eh)(t). Thush’ oh has the same effect on elements 

f(U X V) as the identity mapping of P. By assumption, f(U X V) spans 

P, and so h’oh must be the identity mapping of P, by Proposition 2.1. 

By a similar argument, h» h’ is the identity mapping of P’. Thereforeh 

and h’ are inverse mappings and must consequently be one-to-one. @Q.E.D. 


The theorem just established shows that any two tensor products of U and V 
are canonically isomorphic, that is, in a specific way. Thus any two tensor products 
are interchangeable, and for this reason we shall usually not distinguish between 
different tensor products of U’ and V. Any one of them [P, f} will be denoted by 
U ® V, ealled simply tke tensor product of U and V, and for the given bilinear 
mapping f of U x V to P, that is, to U @ V, we shall use the notation 


24 f(x yl =x@y 
With this notation the bilinearity of f is expressed by the following rules, which are 
just restatements of (2.1): 

(x +) Oy = x®y + x'By 
28 x@(y ty) = x@y + x@y’ 

(ex) ®y = x@(ey) = ¢- (x@y) 
for any x, x’ in U, any y, y' in V, and any cin K. Definition 2.3 says that, given 
any bilinear mapping g: U x V — L, there is a linear mapping g:: U @ V —L, 
necessarily unique (Proposition 2.2), such that g = gicf. That is, g(x, y} = 
e.(f(x, y)), or, using (2.4), 


28 B(x, ¥) = g(x @ y) 


Proposition 24 Let U @ V be the tensor product of two reetor spaces U, V over K. 
Then every element of U’ @ V can be expressed in at least one way as a sum 


Snes, 


withx, in U and y;in V (i = 1,..., 8). 


518 Tensors Ch, 16, See. 2 


Proof. Let U ® V be the tensor product {P, f}. By Definition 2.3 
the image f(U x V) in P spans P. Therefore any element t of P can be 
expressed asa linear combination t = ei: + - - - + eu,, where each a; is 
in f(U x V). That is, each wy can be written u; = f(x, y:). Now cat; = 
ed(xn ys) = Mexn ys), since f is bilinear. Writing x! = cix;, we have 


t=faun) t+ $ilhy) or taxi @ynt--- +n Oy 


in the notation of (2.4). This proves our contention. @.E.D. 


THEonem 25 Lei U and V be vector spaces over a field K, of dimensions m and n, 


respectively. Then their tensor product U ® V has dimension mn. If {u, ... Un} 
and {v, ..., Ya} are bases for U resp. V, then the elements u: @ v; in U @V 
G@eal,...,m; f=i,..., 2) forma base forU @ V. 


Proof. Let f:U x V — P be the bilinear mapping defined in (2.2). 
Then |P, f} satisfies Definition 2.3 and can therefore be taken as the tensor 
product U @ V. Now P has as base the elements {p;;} and therefore has 
dimension mn. From (2.2) and (2.4) we have pi; =u; @ uy. QED. 


THEOREM 2.6 Lei U and V be iwo vector spaces over a field K. Then there is a unique 
isomorphism from U ® V to V @ U sending x ® y into y @ x for any x in U and 
yin. 
Proof. Let U @ V be the tensor product /P, f,j, andlet V @ U be the 
tensor product {Q, f'}, so that f’ is a bilinear mapping of V x U to Q. 
Now the mapping g: V x U — P defined by gty, x) = f(x, y) obviously 
satisfies Definition 2.3, with U and V interchanged. Hence {P, g} is also 
a tensor product of V and U, in that order. By Theorem 2.3 there is a 
unique linear mapping h: P — Q, necessarily an isomorphism, such that 
he g =f’. Using the notation f(x, y) = x @ y and f'(y, x) = y ® x, this 
last equation gives us (he g){y, x) = f(y, x), or h(g(y, x)) = y @ x, or 
finally h(x ®@y) =y @x. QED. 
The theorem just proved is a kind of “commutative” law for the tensor produet 
of two vector spaces. In the following paragraph we establish an analogous ‘‘asso- 
ciative law.” 


EXERCISES 


1. U being a vector space over K, show that there is a unique linear mapping 
bh: K @ UU such that h(l @ x) = x for all x in U, and show that h is an 
isomorphism. 


21f Ux .-+ x U,>E and Vi x x V,—M are multilinear 
mappings, prove that the mappingh of U; x --- X U, X Vi X -- + X Vsinto 
L @ M defined by 
hi, wo Un Vy ey Vs) =f, 2. uO gv, . Hs) 


is (r + 8)-linear. 


Tensor products of more than two factors 519 


3. Let U, V be vector spaces over K, and suppose that V is the direct sum of 
subspaces V, and V2. Show that U @ V ean be regarded as the direct sum of 
U @V and U ® V;. 

4, If f: U = U’ and g: V — V’ are linear mappings of vector spaces over K, 
show that there is a unique linear mapping h: U @ V = U’ @ V’ such that 
R(x @ y) = f(x) @ g(y) for all x in U and y in V. 

5, Let f: U xX V — P bea bilinear mapping of finite-dimensional vector spaces 
over K such that f(U x V) spans P. Suppose that for every bilinear mapping 
g:U XV —K there exists a linear mapping gi: P — K such that g = giof. 
Show that {P, f} is a tensor product of U and V. 

6. Write out in detail the definition of the tensor product G, ® G, of additive 
abelian groups. 

*7, For any additive abelian group G prove that there is a unique homomorphism 
G ®Z—G mapping 4 @ 1 into « for any a in G, and show that the mapping is an 
isomorphism. 

+8. Let G, be a group of order 3, and let G, be a group of order 5. Prove that 
G, ® G; consists of the zero element alone. 

*9. Let U, V be vector spaces over K, and Jet W consist of all mappings f: U x V 
— K such that f maps all but a finite number of pairs (x, y) into zero. Make W 
into a vector space by the usual definitions of addition of mappings and of scalar 
multiplication. We use the following notation: If f in W maps all pairs (x, y) into 


zero except, say, (x1, Yi)...» » (%-, ye), and if f maps the 7th one of these into the 
scalar a;, then denote f by the symbol 
Gy(K1 Yr) + (Xe Yo) +o + + Ge(Hr Ye) 


With this notation let J denote the subspace of W generated by all elements repre- 
sented by symbols of the following types: 

x+tx,y) — Gy — (xy) 

wyty-Gayn-wy) 

(ax, ) ~ % ay) 

(ax, y) — a(x, ¥) 
Prove that the quotient space W/,J is a tensor product of U and V (see Sec. 10, 
Chap. 9). 


8. Tensor products of more than two factors 


Given vector spaces U, V, W over a field K, we can apply the tensor-product 
operation twice, forming, for example, (U ® V) ® W, ete. Ina similar way we 
can form repeated tensor products with any number of factors. In this section we 
shall prove the basie facts. The most important applications in this chapter will 
involve repeated tensor products (U ® U) ® U, etc., of a single space. We start 
with an auxiliary result. 


520 Tensors Ch, 16, Sec. 8 


theorem 3.1 Let U,V, W, L be vector spaces over a field K, and let gs U XV X 

W — L bea 3-linear mapping. Then there is a unique linear mapping g’: (U @ V) 

® W —L suck that g(x, y, 2) = g'((x @ y) @ 2) for any xin U, yin V, tin W. 

Proof. For fixed zin W the mapping U x V — L defined by (x, y) > 

g(x, y, z) is clearly bilinear. Therefore, by definition of U ® V and by 
Proposition 2.2, we have 


a4 B(x ¥, Z) = i(x @ y, 2) 


where g: is a unique linear mapping, depending on z, of U@ V—L. We 
claim that g; depends linearly on z, that is, that 


git, 2+ 2’) = gilt, 2) + gi(t, 2) 


a2 
g(t, ez) = ¢- g(t, z) 
for any tin U ®@ V, any z,z' in W, and anye in K. To prove this we start 
with 
B(x, ¥, 2 + 2) = U(x, y, 2) + Bx y, 2’) 
which follows from the fact that g is multilinear. Using (3.1) on each 
of the three terms, we obtain 

33 ax @y,2+2/) = g(x @y, 2) + ax @y, 2) 
Similarly, from the equation g(x, y, ez) = ¢- g(x, y, 2), we get 

Ba glx ® y, cz) = e+ g(x @ y, 2) 

These equations show that (8.2) holds for elements of the type x ® yin 
U ®V. But from Proposition 2.4 we can write any tin U ® V as asum 
of such elements, say t = 2x, @ y. Replacing x @ y in (3.3) by x; ® yi 
and summing, we have 
Zi g(x: @ yy, 2+ 2) = Zi giles @ ys 2) + Zi Giles @ Yo 2) 
Since g, is linear in its first entry, we obtain for the left member 
Zig @y, 242) = Olin @yazt+z’) = gilt zt 2) 
Doing the same thing for the other two sums above, we obtain the first 
equation of (3.2). The second equation follows similarly from (3.4). 
Thus we now know that gi(t, z) is linear in both entries. In other words, 

it is a bilinear mapping of (U @ V) X W to L. Hence, by definition of 
(U ®V) @W, there is a linear mapping g’: (U @ V) @ WL such 
that 

aS gi(t, 2) = g(t @ z) 


Combining this with (3.1), we obtain g(x, y, z) = g(x @y,2) = g(x ® 
y) ® 2). Moreover, this last equation determines the linear mapping g’ 


Tensor products of more than two factors S21 


uniquely. For one easily sees from Proposition 2.4 applied twice that 
(U ® V) @ W is spanned by elements of the form (x @ y) ®z. The 
last equation above specifies g’ for all elements of that form, hence deter- 
mines g’ uniquely, by Proposition 2.1. Q.£.D. 


corottary Under the sume hypotheses, there is a unique near mapping g': U @ 
(V @ W) —L such that g(x, y, 2) = g’(x @ (y @ 2)). 


REMARK 1. One must be wary of trying to define linear mappings of tensor 
products by specifying them on the tensor product of elements. The equation 
a(x, y, Z) = g(x @ (y ® z)) determines the linear mapping g’ uniquely, as we 
have just seen. But it does not by any means guarantee that g’ exists. The 
trouble is that the element x ® (y ® z) can be written in many different ways, 
for example, 1x @ (2y ® 2), or 9x @ (43y ®@ 142), ete., and consequently one 
must guard against plausible-looking definitions which are in faet inconsistent. 
For example, we cannot define a linear mapping f: U ® U = U by setting 
f(x @®y) =x+y. For x@y=(-x) @(—y), and the same prescription 
would give f(x ® y) = f((—x) @ (~y)) = —x —y, which is nonsense. 


We now prove the “associative law’’ for tensor products. 


queorem 2.2 Lei U, V, W be vector spaces over a field K. Then there is a unique 
Unear mapping h:(U @ V) ® WU @(V @ W) such that W(x @ y) @ 2) = 
x @ (y @ 2) for any x in U, y in V, z in W. Purthermore, h is an isomorphism. 
Proof. It is trivial to check that the mapping g:U x V XW — 

U @ (V © W) defined by 


3.6 (XY, Z) = x @ (y @ 2} 


is 3-linear. Hence, by Theorem 3.1 (with L = U ® (V @ W)), there isa 
unique linear mapping h:(U @V) @ WU @®(V @W) satisfying 
g(x, y, 2) = h((x @ y) ®@ z). Thus, by (3.6), we have h((x ® y) @ z) = 
x @(y Oz). 

To show that h is an isomorphism, one proves by a similar argument, 
using the corollary to Theorem 3.1, that there is a unique linear map- 
ping h’: U @ (V ®@ W) 3 (U @ V) @ W such that h(x @ (y @2)) = 
(x @ y) @z. Then h’ch is a linear mapping of (U @ V) ® W to itself 
which maps each element (x @ y) ® z into itself. From Proposition 2.1 it 
follows that h'oh is the identity mapping of (U @ V) @ W to itself, 
since the elements of the form (x ® y} ® zspan (U ® V) @ W, by Propo- 
sition 2.4. Similarly, ho h’ is the identity mapping of U @ (V @ W), and 
so both h and h’ must be one-to-one, hence are isomorphisms. .E.D. 


In view of Theorem 3.2 we shall usually not distinguish in our notation between 
(U @V) @W and U @(V @ Wh, writing simply U @ V @ W. An element 
(x @y) @z or x @ (¥ © 2), will be written as x @ y @ z. Similar simplifica- 


522 Tensors Ch, 16, Sec. 8 


tions of notation will be made for products of more than three factors. Thus, 
if U:,..., U, are vector spaces over K, we denote their tensor product by 
U1 @ - ++ ® U,, and so on. 

The following theorem shows that multiple tensor products satisfy conditions 
analogous to those of Definition 2.3: 


THEOREM 3.3 Lei Ui, . . . , U, be vector spaces over K. The mapping f: U, X 
++ XU, 7U, 8 ---+ @U, defined by 

a7 ffm, - 6M) =m Oss OL 

(xin Uy) ts r-linear, and its image spansU, @ +++ OU, IfgsUi x +++ XU, 

— L is any r-linear mapping into a vector space L, then there is a unique linear 

mapping g': U1 @ ++ + @ U, > L such that 

38 Bm. ke) = lle B+ ++ Ox) 


Proof. This follows easily by induction on r, For r = 1 it is trivial, 
and for r = 2 it is merely 2 restatement of Definition 2.3, 

We suppose then that the theorem holds for & factors, k <r, and we 
deduce that it must also hold for r factors. By assumption, the mapping 


UX +++ XU41-U @ +++ @U4 defined by Pm, ..., 
Xu) = x1 @ +++ @X-. is multilinear. The mapping f of (3.7) can be 
written f(m,.--,%) = f(a, .-., Xa) @x,. Using the multiline 


arity of f’ and the bilinearity of @ [i.e., the rules (2.5)] one very easily 
sees that f is r-linear. 

Now the image of f’ consists of all elements x; @ --- @x,-: in 
U, ® +++ @U,., By assumption, the image of f’ generates Us @ 

+ @U,_. That is, every element of this vector space can be written 
in at least one way as a linear combination of elements of the type 
% @ +--+ @x,4. Now by Proposition 2.4, every element of U; ® 

+ @U, = (U1 ®- ++ @ U4) @ U, can be expressed in at least one 
way as a finite sum of elements of the type w @x,, with w in U, © 

+ @U,. and x,in U,. Writing each such w as a linear combination 
of elements of the form x @ - - - © x,_; and using the bilinearity of @, 
one easily sees that every element of U,; @ - -- @ U, can be expressed 
as a linear combination of elements of the type x1 ® - + - @x,. Hence, 
the image of fspans U; ®@:-- ® U,. 

Finally, let g be an r-linear mapping U. @ -- - @U,—L. For each 
x, in U,, the mapping UX -:- X U,1—L defined by (u,..-, 
X12) ~g(m,-., %,) is an (r — 1)-linear mapping. By induction 
assumption, there is a untgue linear mapping @: Ui ® . .. @ Urs 4, 
depending upon z,, such that 


Tensor products of more than two factors 523 
38 BM, Ke) = lM D+ + OX, Kr) 


Just as in the proof of Theorem 3.1 above, one shows that g, is a bilinear 
mapping of (U, @ -.- @U,1) X U, +1. Therefore, by Definition 
2.8, there is a linear mapping g’ of (U,® --- @U,») @U, =U, @ 
+ @ U, to Lsuch that gi(t, x,) = g(t @ x.) foranytin Ui @ ++. ® 
U,_, and any x, in U,. Henee, using (3.9), we see that g’ satisfies Eq. (3.8). 
Furthermore, (3.8) determines the linear mapping g’ uniquely, by Propo- 
sition 2.1 and the fact, established above, that the elements of the form 
x @-+- @x,spanU, ®--- @U,. Hence, by assuming the theorem 
for k <r factors, we have proved that it holds for r factors. By mathe- 
matical induction, it follows that Theorem 3.3 holds for all values of 7. 
QED. 


coroLtary Any element in U, ® - - - ®@ U, cun be expressed in al least one way 
as a finile sum of elements of the forms, @ --- @®x, wthxmUy i =... . yr). 
Proof. Since elements of this form span U, ® - - - @ U,, by Theo- 
rem 3.3, any element of this vector space can be written as a linear com- 
bination of such elements. That is, any tin Ui @ ---+ @ U, can be 
written as a sum of elements of the type e(x, ® +» - ® x,), with cin K. 
Bute(x @ > ++ @® x) = (em) @ x @ +--+ @x,, by multilinearity, and 

ex, is again in U,, The assertion follows at once. Q.E.D. 


THEOREM 3.4 LeU, . . . , U, be vector spaces over a field K, and let k be an integer 


with 1<k <r, Then there is a unique bilinear mapping g from (U, @ +++ @ 
Up) X (Urn ® @U,)t0U, @ +++ OU, such that 

8 Bes Sky M Be: OG) MSs ++ OX 

Proof. By definition, the mapping of (U; ® -- - @ Ux) X (Uiat @ 

+ @U,7(U1 @ +++ @ Us) @ (Urs @ +--+ @ U,) defined by (u, 

vy) >u @v forvin U,@+++ @U, and vin Urn @ @ U, is 

bilinear, and it has the required property. The uniqueness follows from 

Proposition 2.1 and the fact that Ui @ - - - @ U; isspanned by elements 

ofthetypex; ® --- @ x,,and similarly for Vissi @ - ®U,. QED. 


The simple theorem above is a special case of a much more general theorem of 
the same type, whose main difficulty lies in the awkwardness of its statement. 


Let us suppose that the index set {1, . . . , r} is divided into p non-overlapping 
subsets Ji, . . . , J», all of them ordered in some way. If J; consists of the integers 
i, ..-, t, let us set 

Vi =U, @++- OU, 


In this way we get p vector spaces. Let us call a simple vector in V; one of the type 
x, @ +++ @ xy, with x, in Us, ete. 


524 Tensors Ch. 16, See. 4 


THEOREM 3.5 There is a unique p-linear mapping g: Vi X --- XV,7UL @ 
+ @ U, such that 


gS, .-., 8) =m Os - + OK 


if s; 1s the simple vector in Vi composed of those x; on the right corresponding to the 
indices in Ji. 

The proof is easily obtained from Theorem 3.3 and Proposition 2.1 in essentially 
the same way as the proof of Theorem 3.4 above. 


EXERCISES 
1. Let U,@ = 1, ... , 7) be finite-dimensional vector spaces over K, and let 
B, = {u°"} bea base for U,. Prove that the elements 
Wy =U? Ba @--- Su 
form a base for U1 @ + + - @ Uy. Tf x; = aviu' is a veetor in U;, show that 
Oe) Oya + au... 


2. Write out a proof of Theorem 3.5 in the special case r = 5, with J; = 12, 3}, 
Jy = {4,1}, Fs = 15}. 

3. Let U,, ... , U, be vector spaces over K. Let f be an r-linear mapping of 
U, x ---+ xX U, toa vector space P such that the image of f spans P. Suppose 
furthermore that, for any r-linear mapping g of Ur X - ++ X U, toa vector space 
L, there is 2 linear mapping gi: P — L such that g = gif. Prove that there is 
a unique linear mappingh: P — U, ® -- - ® U, such that ho f(x... x) = 
x ® +++ @x,, and show that his an isomorphism, 


4. Tensor products of mappings 


In this section we show that the tensor-product operation on vector spaces leads 
to a corresponding operation for linear mappings. 


THEOREM 41 Lei f;: U; — V; be a linear mapping of vector spaces over a field K, 


fori=1,...,1. Then there isa unique linear mapping f:U, ® +--+» ®U,> 
Vi ® +++ @V, such that 
aa fi @- ++ @x) =f) @ ++. OFC) 
for any x, in Us (i =, 2.648). 
Proof. It is trivial to verify that the mapping h: U. X --- XU, 
Vi ®-@V, defined by h(x, .. 2%) =filu) @+-- @F(x) is 
r-linear. By Theorem 3.3 there exists a unique linear mapping f: U, @ 
-@U,7Vi@--- @V, such that hi, ..., x) = fu ® 
+ @X). QED, 


DEFINITION 4.1 The linear mapping f of the preceding theorem is called the tensor 
product of fy... , f-and is denoted by, @ +++ @f,. 


Tensor products of mappings 525 
With this notation, Eq. (4.1) reads 
a2 (fh @ ++ OF @--- x) = fila) +--+. BE) 


This operation on mappings obeys several simple laws. For example, it is 
associative: 


43 (fh @ fh) @ fs = f @ (fh @ fy) 


For the left side sends an element t © x; in (U, @ Us) ® U; into & @ f)() @ 
f(x), by (4.2). Taking t = x1 ® x, we have (f, ® &)( © x) = fila) ® f(x), 
by (4.2) again, and so (f, ® fe) ® f,sends (x: ® x2) ® x; into the element (f(x) @ 
fo(x2)) ®@ fh(xs) in (Vi ® V2) @ Vs. Similarly, the right side of (4.3) sends x @ 
(x ® x) in U, ® (Ur ® Us) into the element fila) ® (h(x) ® f(x) in Vi @ 
(V1 ® Vs). If we now identify (U, @ U2) @ U, and U, ® (U, @ Us) aceording 
to the conventions made following Theorem 3.2, and similarly for the V’s; then 
our assertion follows. 

Thus the equality sign in (4.3) is not really correct, since it depends upon our 
willingness to replace one vector space (U; ® Us) ® Us by another one U, @ 
(Uz ® U;) which is isomorphic to it in a fixed way (and similarly for the V's). 
In principle one could develop a rigorously correct notation which would maintain 
the distinctiont between (U, @ Ur) @ Us and U, ® (Us ® U;), ete. But the 
satisfaction of having such an impeccable notation would be far outweighed by its 
ungainly complexity. We shall therefore go ahead and write = when we really 
mean something else. 

One easily verifies the following rule, which is a kind of extension of (4.3): 
4a (h ® 


(Of) Olin ++ OF =h@--- OF, 


The rule is a simple consequence of the uniqueness of the indicated linear mappings. 
Another important property of the tensor product is expressed in the following 
theorem. 


THEOREM 4.2 The iensor product f,@--- @f, of mappings is a multilinear 
operation. 
Proof. We recall that the vector space of all linear mappings U; — V; 
is devoted by Hom (U,, V.) (see Sec. 4, Chap. 9). A more precise state- 
ment of the theorem is as follows: The mapping 


as Hom (U,, Vi) X +++ x Hom (U,, V,)—-Hom (U1 ®--- @U,, 
Vi®--- @Vy) 
defined by 


{ We recall that even the symbo! U, ® U, itself does not denote a fixed vector space, 
denoting as it does any of the infinite number of isomorphic tensor products. 


526 Tensors Ch. 16, Sec. 4 
as (fh. fh ® s+) OF 


is an r-linear mapping, 
We must prove that 


f£@--- @fhk +h @--+ @L=hO--. OF @-++ OF 

ar +H @--- OF @--- OF 
f@-+- O(c) @--- @L=-e-Hh@--- Ok ®--- Of) 
By (4,2) the mapping on the left of the first equation sends @ - - + @ x, 
into 

48 fm) @ +++ @ hk + fO0u) @ +++ B f(x) 
By definition, (fe + fOOu) = fOa) + fi(u). Putting this in (4.8) and 
using the multilinearity of the tensor product in V;@-.- @V, 
(Theorem 3.3), we find that (4.8) is equal to 
fim) @ +--+ @fi(m) @--- @ f(x) 

thm) @ +--+ @ fix) @--- f(x) 


Now the right side of the first equation in (4.7) has exactly the same 
effect on m @ - ++ @x,,a8 is quickly seen, The correctness of the first 
equation of (4.7) then follows from the uniqueness part of Theorem 4.1. 
The second equation is proved similarly. @.8.D. 


tHeorem 43 Lei U,, . . . , U, be vector spaces over K, and let gi: Ui > K bea 
linear mapping (i = 1,...,97). Then there isa unique linear mapping g: 0, ® 
++ @U, 4K such that 
Bm @ Ox) = g(a) «BH lHe) 
foranyx,inU; (i =1,...,7) 
Proof. The mapping U. x ... X U,—K sending (m, ... , X+) 
into gi(m) + + - g-(%-) is easily seen to be r-linear. The assertion follows 


at once from Theorem 3.3, @,E.D. 


The foregoing theorem can be connected with Theorem 4.1 as follows: The 


tensor product g1 ® +++ ®g, is a linear mapping of Ui@®---+ ®U, to 
K®.--+ @K (r times). The mapping h: K @--- @K —K defined by 
 @ +++ @e,—¢,--» ¢, isan isomorphism of these systems regarded as vector 


spaces over K, The mapping g of Theorem 4.3 is simply the composition of 
&O->-- @g-andh, 


EXERCISES 
1, Prove that hi K@ +++ @K — K as just defined is a vector-space isomor- 
phism. 
2. Let fi: Us > Vy and g.: V; > W, be linear mappings of vector spaces over K. 
Prove that 
(2 @-+- @g)eh®-+- Of) = (gieh) @+-- @ (Bef) 


‘The tensor algebra of a vector space 527 


3. Let m, . . . , uy be linearly independent elements of U, and let wu, .. . , ve 
be linearly independent elements of V, where both U and V are finite-dimensional 
vector spaces over a field K. Prove that the rs elements u; @ v; in U @ V are 
linearly independent. 

+4, Let f: U =U’ and g: VV’ be linear mappings of finite-dimensional 
vector spaces over K. Prove that 
Ker (f ® g) = (Ker f) ® V + U ® (Ker g) 
and 
Im (f ® g) = (Imf) ® (Img) 

5. Let T: U — U and 8: V — V be endomorphisms of vector spaces over K. 
Let x be an eigenvector of T, and let y be an eigenvector of 8. Prove that x @ yis 
an eigenvector of T ® S. 


5, The tensor algebra of a vector space 


We start with a preliminary definition which allows us to embed any given vector 
spaces over a field K in a larger vector space. 

Let J denote an arbitrary set, and suppose that there is assigned to each element 
jin J a vector space U; over K. Let S denote the set of all mappings f which 
assign to each j in J a vector /(j) in U; in such a way that 


Le £(4) ts the zero vector in U; for all but a finite number of j in J. 
We make S into a vector space over K as follows: 


+P =fO +1 

f9) =e KA 

for any f, f’ in S, any j in J, and any cin K. It is quickly seen that f + f’ and ef 
so defined are again elements of S, and it is a routine matter to verify that S 
equipped with these operations is a vector space. 


DEFINITION 1 The space S is called the direct sum of the family {U;}. 


Suppose for example that J is the finiteset J = {1,2,... ,n}. Condition (5,1) 
is superfluous in this case. An element f of S assigns to each integer j from 1 to n 
a vector x;in U;. Thus fis simply the n-tuple (m, . . . ,X,). With this notation, 
(5.2) becomes 

(Ky Re) EY Yn) = OY Re + Yn) 
S32 
CCR ee En) = (EXy + 5 OX) 


Hence, for finite sets J, the definition of the direct sum coincides with that given 
earlier (Exercise 3, Sec, 3, Chap. 8, and Exercise 2, Sec. 3, Chap. 13). 


Remark 1. If condition (5.1) is omitted, the resulting vector space is called the 
direct product of the U,. If J is finite, then there is no distinction between direct 
sum and direct product. 


528 Tensors Ch. 16, See. 5 


Returning to the general case, let x; be an element of U;. Denote by x) the 
element of S defined by 


$4 xi@) = {i te <j 

0 fix; 
It follows at once that the mapping U; — S defined by x; — xis a linear mapping 
which maps U; isomorphieally onto a subspace Uj of S. Now let f be any element 
of S and write f(/) = x,, so that x; isin U;, By condition (5.1), all but a finite 
number of the x; are zero. Let xj, . . . » X; be those which are not zero, From 
(5.4) and (5.2) it follows at once that 


58 feet +e, 


Conversely, given any elements xj, . »x;,in Uy, ... , Uj, (5.5) defines an 
element f in S. 
Finally, we simplify our notation by writing x, instead of xj. With this step 


forward we have the following: 


5s Any element of the direct sum S can be expressed as a finite sum 
Ky tebe +R, 


with x;, im Uz, ..., X;, in Uj, Purthermore, the expression is unique, 
provided the x’s are nonzero and provided the elements j\, ... , Jr of J are 
distinet, 


Now let U be an n-dimensional vector space over a field K. We recall that the 
dual vector space U* of U is the vector space of all linear mappings U — K; 
U* also has dimension m (see Sec. 4, Chap. 9, and See. 2, Chap. 14), We introduce 
the following notation: 

87 Ue, =U@--- @UQU*®--. BU 


Pp gq 


In particular, U% is the tensor product of U with itself p times, and U¥, is the 
tensor product of U* with itself ¢ times, Thus Uy = U and 0%, =U*. We 
further define 


a8 Uy = 


From all these vector spaces we now build a giant vector space T(U), namely, 
their direct sum: 


5.3 T(U) = direct sum of all U?, (q=0,1,2...) 


The elements of T(U) are called tensors on U. As we have just seen, each U", 
can be regarded as a subspace of T(U). The elements of U" are called contra- 


The tensor algebra of a vector space 529 


variantt tensors of rank p; elements of U®, are called covariani+ tensors of rank 9; 
elements of U?, with p,q > 0 are called mized tensors of type (p, ¢). 
From (5.7) we have 


sio UP, = U% @ UY, 


provided p > 0 and q > 0. This formula can still be regarded as correct even if 
por gis zero. For example, ifg = 0, the right-hand side is U%) @ U% = U% @ K. 
But there is a unique isomorphism from this space to U" mapping x @ ¢ into 
ex for any x in U%) and cin K, Therefore U% ® U% can be unambiguously 
identified with U%. The same is true for U®,. 

T(U) is a vector space over K, and we now show that it can be made into a ring. 
That is, we define a product in T(U). First we define the product of elements t 
and t’ in the subspaces U*, and U”,, respectively, and their product will be in 


ut). Thus we require a bilinear mapping 


sa, XU, a Ut 


It is easily obtained as follows: the mapping 
842 Ur, xX Ur. + U, BU, 

given by 

5.13 (tt) -t et’ 


is bilinear, by definition of the tensor product. Using (5.10) we have U*, @ U, = 
U> © U%, ® U, @ U%,, provided we make the appropriate identifications re- 
quired in (5.10) if any of the indices p, q, r, s are zero. From Theorem 2.6 one 
easily shows that there is a unique isomorphism 


sae UY, @ Ut, + U%) @ Un @ UY, @ 0, = UE 
such that 
5.15 X@y* OLS wt +x Bz @ y* @w* 


for any x in U%, y* in U%, zin Ua, and w* in U%, 

We now define (5.11) to be the composition of (5.12) and (5.14), and we shall 
continue to denote it by the tensor-product symbol, even though that will conflict 
slightly with our earlier notation because of the interchange of factors involved in 
(5.14), Hence, for tin U", and t’ in Us, we denote the element in UZt; obtained 
from (5,12) and (8.14) by ¢ ®t’. Thus, fort = x @ y* and t’ = z @ wt we have 


+ These are purely conventional terms borrowed from differential geometry. We observe 
that, since the dual of U* can be identified with U (Sec. 2, Chap. 18), the space T(U*) 
can be identified with T(U) in a specific way. Which tensors are called contravariant and 
which are called covariant depends upon whether one starts with U or U*. 


590‘ Tensors Ch. 16, See. & 
5.16 (x @ y*) @ (2 @ w*) =x @z @ y* @ wt 


If any of the indices p, g, r, are zero, this formula is to be interpreted as follows: 
A tensor product of a scalar and any tensor is understood to be ordinary scalar multi- 
plication, For the identification of U%) @ U% with U* required to make (5.10) 
valid when g = 0 amounts to replacing x @ ¢ by ex; similarly, if p = 0, the 
identification of U% ® U°, with U°, is achieved by replacing ¢ ® y* by cy*, ete. 

To complete the definition of the product in T(U) we use (5.6): Any element t 
in T(U) can be expressed as a finite sumt = t +--+ +t of homogeneous ele- 
ments, i.e., elements in various of the subspaces Ur. If t =t( +--+ + this 
another element of 7(U), also expressed as a sum of elements in the subspaces 
UP,, then we define 


say tet = Yb et 
oF 


each term on the right being given by (5.16), From the uniqueness part of (5.6) 
one shows easily that (5.17) does not depend upon the particular expressions for 
t and ¢’ as sums of homogeneous elements. 


THEOREM 51 The system T(U), with product defined by (5.16) and (5.17), is a 
K-algebra. Every element of T(U) can be expressed as a finite sum of elements of the 
type 


818 @ + OSH S--- OY (xz in U's, y*y in U) 
and the product in T(P) of two such elements ts given by 


S19 (MO +++) Oxy B--- @y¥PO(U® +--+ 2, @wl®--- Ow) 
= UG +++ O%OuG +--+ O2OyIG--- B¥OwWS--- Ow 


Furthermore, the contravartant tensors in T(U) form a subalgebra 
5.20 TAU) = direct sum of Ur (p = 0, 1,2, . . ) of TU) 
and the covariant tensors form a subalgebra 

5.21 T(U) = direct sum of U, (¢ = 0,1,2,.. .) of TCU) 


We recall that a K-algebra is a vector space over K with a product operation 
which makes it a ring and which satisfies the associative law (4.3) of Sec. 4, 
Chap. 8. It is a routine matter to verify that the product @ in T(U) meets these 
requirements. The fact that any element in T(U) can be expressed as a sum of 
elements (5.18) follows at once from the corollary of Theorem 3.3 and from (5.6). 
The formula (5.19) is an expanded version of (5.16). It is clear that T)(U) de- 
fined by (8.20) is a subspace of T(U), and that the product of any two elements of 
To(U) is again in T,(U), The same is true for TU). 


The tensor algebra of a vector space 58 


DEFINITION &2 T'o(L/) is called the contravariant tensor algebra over U, and TU) 
is called the covariant tensor algebra over U; T(U) is called simply the tensor 
algebra over U. 


THEonEM 5.2 Let U and V be finite-dimensional vector spaces over a field K, and 
let f: U + V be a linear mapping. Then there is a uniquely determined homomor- 
phism fo: To(U} — To(V) of the contravariant tensor algebras such that 


8.22 fy(x) = f(x) for any xin U = US 


Furthermore, there is a uniquely determined homomorphism ©: T°(V) + T°(U) of the 
covariant tensor algebras such that 


523 My*) = "y*) for any y* in V* = V%, 
where ‘f denotes the transpose of 


Proof. First of all, 2 homomorphism of algebras is a mapping which is 
compatible with all three of the operations in sight—-namely, sum, scalar 
multiplication, and product. In particular, f° and f, must be linear 
mappings. 

Consider first fy: For it to be compatible with the product operation 
we must have fo(x: @ x2) = fo(%:) ® fo(x:) for any x, x: in U, and from 
(5.22) we get fo(x: ® x2) = f(x:) @ f(xz). By a simple induction, one 
finds similarly the requirement 


52 hols @ + @ xy) = f(K1) @ + + @ folxp) 
= f(a) ® - ++ @ f(x) 

for any %, ..., x» in U. By Theorem 4,1 there is a unique linear 

mapping Us? > Ve satisfying this condition, namely, f@--- @f 


(p times). Since fy is also linear, it follows at once that 


=f£@-+-@f onU% 
p 


5.25 


This shows at once that fy is uniquely determined. Conversely it shows 
us how to construct f. Namely, any tin To(U) can be expressed as a sum 
t=t+--+-+ +t, of elements in the various subspaces U%, say ¢; in 
Uy". We then define 


5.26 fo(t) = folts) + + + h(t) 


with 


{ We denote the transpose here by ‘f instead of f*. The latter symbol is used below for 
another purpose. 


532 Tensors Ch. 16, Sec. 5 


827 h(t) = FO --- @f (ti) 
Pe 
This defines fy uniquely, and it is easily verified that f is a homomor- 
phism as claimed. The assertion concerning f? is proved similarly; the 


mapping ‘f involved is discussed in Sec. 4, Chap. 9, and Sec. 2, Chap. 18. 
QED. 


The preceding theorem leads to an important element of structure underlying 
the tensor spaces associated with a vector space U. Let us denote by GL(U) 
the group of all invertible linear mappings of U to itself (this group is called the 
general linear group on U). Then GL(U) operates on U as a group of transforma- 
tions, in the sense of Definition 5.1, Chap. 10. We show now that GL(U) acts in 
a natural way as a group of transformations on the entire tensor algebra T(U). 
According to Sec. 5, Chap. 10, we must produce a homomorphism ) of GL(U) to 
the group of all one-to-one mappings of T(U) to itself. 

First of all, for any invertible linear mapping f of U to itself we define 


528 f= (Fy) 


We claim that f* is an invertible linear mapping of U* to itself (U* denotes the 
dual space of U), and that 


5.29 (fg)* = fg* 


for any f, g in GL(U). 
To show this, recall that the transpose ‘f of any linear mapping f of U to itself 
is the unique linear mapping of the dual U* to itself such that 


5.30 CHO). Y) = Oc 107) 


for any x* in U* and any y in U. In particular, for the identity mapping I of 
U we have (‘I(x*), y) = (x*, I{y)) = (x*, y), and it follows that ‘I is the identity 
mapping of U*, Now replace x* in (5.30) by ‘g(x*). Since (‘f'g)(x*) = ‘f('g(x*)), 
by definition of composition, Eq. (5.30) gives us ((‘f'g)(x*), y) = (l'g(K*)), y) = 
gO"), FO) = 00", (FV) = Oe, (BD (Y)) = ((gN(x"}, y), this holding for 
all x* in U* and all yin U. From this we conclude that 


5.31 (gf) = 'ftg. 


Replace g here by f. The result is ‘I = ‘f'(f-). But ‘Tis the identity mapping 
of U*, and therefore (f-2) is the inverse of ‘f. That is, (f-!) = (7. In par 
ticular, ‘f is invertible if fis invertible. Referring now to (5.30), we have (fg)* = 
“(ig)! = (gf), by the general rule for inverses; from (5.31) there results 
(fg)* = (f-)(g-"), which proves (5.29). 

Now write fs for the restriction of fa of Theorem 5.2 to elements of Us, Thus, 
by (8.25), 


The tensor algebra of a vector space 583 


$32 Po=fO+-- OF on U%y 


The second part of Theorem 5.2, applied to a mapping g: U + U, asserts the 
existence of a unique ring-homomorphism g® of 7°(U) to itself such that g° = 
‘gon U®, = U*, By simple considerations analogous to those leading to (5.25), 
we have 


$33 g='g@---@'g on, 
@ 


For g here take f~! and denote the restriction of the resulting g° = ( 
by f,. According to (5.28) and (5.33) we have 


)° to U%, 


64 P,=*@--- @f on, 


We now define 

8.35 fr, = f¥) ® f, 

for any fin GL(U). This is an isomorphism from U?, to itself, and 
5.36 PM =f@---@efere--- of 


Pp q 


Using (5.29) one shows easily that 
837 (fg), = bP 98? 
For any f, g in GL(U). Hence we have GL(U) operating as a group of linear 
transformations on U4. 
Finally, define a Jinear mapping d(f) of T(U) to itself as follows: If t is any 


tensor in T(U) and if we write it in the form t = t, + +++ + t., where each t 
ig an element of some U?,,, then we define 


MACH = Do brig te) 
(the mapping \(f) is simply the direct sum of the f*,). From (5.37) there follows 
Mfg) = (0) Mg) 


for any f, g in GL(U), and the correspondence f — A(f) establishes the operation 
of GL(U) as a group of linear transformations on T(U}. Moreover it is easily 
seen, using Theorem 5.2, that each \(f) is in fact an isomorphism of the tensor 
algebra T(U) to itself. 


DEFINITION 5,3 Let U and W be vector spaces over the same field, and let there be 
given a fixed operation k of GL(U) as a group of linear transformations of W to 
itself (that is, h is a homomorphism from GL(U) to GL(W)). We call W, with the 


58h Tensors Ch. 16, Sec. € 


operation k, a space of tensors of type (p, q) on U if and only if there is a linear 
mapping o: W — U?, whose kernel is zero and for which 
538 olh(f)w) = FP elw)) 


for all w in W and all f in GL(U). 
This last condition states that » is compatible with the operation of GL(U) in 
W and U»,. 


examPLE Let A, denote the space of all r-linear functions on a vector space U 
over a field K. That is, A, is the space of all r-linear mappings 

UX+++ XU SK 

Co 

r 
In Theorem 7.1 below it is shown that any r-linear function won U determines 
a unique element w’ in U°, and that the mapping ¢,: A, ~ U®, that sends w into 
w’ ig an isomorphism, 
We define an operation h of GL(U) on A, as follows, writing simply f, for A(f). 

Thus f, denotes the action of f on A,, and it is given by the formula 


= wf(n), 2. 2. FM) 


5.39 Fw(X, oe ey 


for any f in GL(U), any win A,, and any x1... , x-in U. One verifies easily 
that (fg), = fg, for f and g in GL(U), and it is easy to show furthermore that 


el E(w)) = Pe(er(w)) 


which is just condition (5.38). Therefore A, is a space of tensors of type (0, r) 
on U. 


EXERCISES 

1, Let U be a one-dimensional vector space over K, Prove that the contra- 
variant tensor algebta To(U) is isomorphic to the polynomial algebra K{é) in 
one variable ¢ over K. 

2. If f: U > V is injective (ie, has kernel zero), prove that fy of Theorem 
5.2 is injective. If f is surjective, prove that fy is surjective and that f° is in- 
jective. (‘‘f is surjective” means that Im(f) = V, and similarly for fy.) 

3. If dim U = 2, prove that dim Ur, = n+, 

4, Let GL(U) operate on Hom (U, U1) by: x > g—'xg. Show that Hom (U, U) 
is thereby a space of tensors of type (1,1). 


6. Bases and components 


Again let U denote a finite-dimensional vector space over a field K, say dim U = 
n, and let T(U) be its tensor algebra, as defined in the preceding section. We 
shall show that 2 base in U gives rise to a base in each of the subspaces UP, of 
TU). 


Bases and components 535 


Let B= {e,..., en} be a base in U. Associated with B is the so-called 
dual base B* = {el,... , e"} in the dual space U*. Wee recall that B* is de- 
fined by the conditions 
1 ifisj 


ei) = 6); = 
sa Ce ed {s itive j 


where in general (f, x) denotes the element f(x) in K into which a linear function 
f on U maps a vector x (see Sec. 4, Chap. 9, and Sec. 2, Chap. 14). 

By repeated application of Theorem 2.5 one easily shows that the elements 
62 el Tae, @+-- Oe, Ger @--- Oeit 


ihe ip 


form a base for U?,, where the ?’s and j’s range independently from 1 ton. Thus 
dim Ur, = n?+4, Therefore any element t in U", can be written uniquely as 


ip git fe 
re Sa tp 


63 t= 
where the i)" j* are elements of K. They are the components of t relative to 
the base (6.2). Since the base (6.2) is uniquely determined by the base B in U, 


we also call the a 2 the components of t relative io B, 


To give an example, let x1, . . . , x, be elements of U, and let yf,..., y? 
be elements of U*. Let 
Ra = £4Qe; yes ye’ 
Then 
UO ++ Oxy Sy G+ ++ Gy} = (alle) @ + + - @ (wprei,) 
@ Ger) @-- + @ (yi,e%) 
Using the multilinearity of the product on the right (Theorem 3.3), we obtain the 
expression 
i ip ae 7 A 
oa apts meu, Web 
showing that the components of x, ®@ -- - @xp @y! @ +--+ @ y} relative to B 
are the quantities xj! .- xjry!, - . - y%, formed from the components of the 
vectors, 


Let us now compute the components of 2 tensor product t @ ¢’, where t is in U?, 
and ¢’ isin U’,. Suppose then that 


thy Be kt eo ke 
i a ka ey 


as taf Pe 


My 


From the fact that the product in T(U) is bilinear, 


ee ee Ake 
POO et ee e Been Kn 


by Fh 


586 Tensors Ch, 16, See. 6 


From (6.2) and (5.19), this becomes 


a ett tp gAL Ae gil je At ky 
ory tee ai ym et iets 
In other words, if we write 

be pit bose ght os jess 
67 teat A tgte es, intr 
then 

Tat doer gt tp grips + fpar 
bd t a iis ne ig! tet fore 


Therefore, the components of t @ t’ are obtained by simply multiplying the ap- 
propriate components of t and (’, all relative to the given base B. 
The effect of a change of base in U on tensor components is easily computed. 


Let B, = |u,, . . - , un] be another base, and let Bi = fu’, . . . , u"} be its dual 
base. Let 
6.9 e: = ahi, ef = bigt 


be the equations connecting B and B, and the dual bases. From (6.1) we have 
6, = (bau, aay) = bial, (ut, ua) 
= bat, = bial, 


showing that (a*,) and (b%) are inverse matrices. 
Substituting (6.9) in (6.2) and using the multilinearity of the tensor product, we 
obtain 


ee eg vabgb = Bi 
where 
ou uy @s Sm, Ou @--- Sure 
Referring now to (6.8), if thas components i; "| "|" relative to the new base By, 
then from (6.10) and (6.8) there follows readily 
ul 4 4 bb gn . Woe 
wae Giga se apa, be 


(See Exercise 2 below for an interpretation of this equation.) 

Finally we compute the mappings fy and f° of Theorem 5.2 in terms of com- 
ponents, Let f:U + V be a mapping of finite-dimensional vector spaces, and let 
B= {e,...,e,} and B’ = fv, .. . , vm} be bases for U and V, respectively. 
Let 


6.13 fle) = civ, 


Bases and components 587 
By (5.23) we have 


fle, @ +++ @e,) = fle.) @ +» + @ fle:,) 
= (iva) @ ++ + @ (chevs,) 


=e + + + cv B+ + Bay, 
or 
614 foley... ig) = Pa ro Cig Van sap 
where 
6.45 Gy HU Oe Oe, 
and similarly for vi. .a If t = £1 --- ‘re, . ;, is an element of U%, then by 


linearity there follows 
ee fot) = 1 nfoles si) = Bee va iy 


showing how to compute the components of fo(t) relative to the base B’ in V. 
To compute f? we recall from Theorem 4.4, Chap. 9, that 


(ffY*), = Cy", FO) 


for xin U and y* in V*, Let B* = fel, .. ., e"} and B’* = fv!, .. . . v"} be 
the bases dual to B and B’. We have 


(Ev), ed) = Cv, fled) 


= (vi chin) = XW, vy) = eH 


whence 
617 ‘fiv)) = chet 
Then, writing 


vA ig = yA @si +s Owe 


6.18 
ell ser @--- @eie 


we have 


Ply - Fe) = Ev) @ . +. @ (vie) 
= (chet) ® +++ ® (cites) 


by (5.23), and so 


ety M(vi--4e) See es eltelt te 
which shows the effect of f° on the base elements v1 --- #2 in V°,. If 
taba. ight 


is an element of V°,, then, by linearity of f°, 


sze Mt) 


ty. el + + + ele’ fe 


538 Tensors Ch. 16, See. 7 


which shows that the components of f() relative to B are the quantities 


6.21 tae. ath s+ es 


EXERCISES 
1. Let f be as above, let t; be in V°, and let t: be in V°,. Show how to compute 
the components of f(t, ® t:) in U$,,. 
2. Let g denote the automorphism of U given by formula (6.9). Prove that 
(6.12) is the operation g?, of g on U?,, 


7. Contraction of tensors 


Let U be an a-dimensional vector space over a field K, and as usual let U* denote 
its dual space. Anelement y” of U*isa linear mapping U — K, and for any xin U 
we denote the scalar y*(x) by the symbol (y*, x). The mapping U x U* > K 
defined by 


a (x, ¥*) > Cy", &) 


is a bilinear mapping. By definition of U @ U* = U', there is a unique linear 
mapping U'; — K such that 


ne X@y* > (y*, ®) 

This mapping is called contraction. Now let B = {e1,... , &.} bea base in U, 
and let B* = {e!, .. . , e”} be the dual base in U*. If 

rAd x= we; y* = y;e 


then from (6.1) we have 

co (4) = ty: 

Thus (7.2) can be written 

15 rype@e’ rye’, ei) = xy: 


More generally, if t = ¢';e;@ e’ is any tensor in U4), then the mapping U1) > K 
determined by (7.2) sends ¢ into the scalar 


7.6 Be, ed = ty 


by linearity. (We note that the components of t form an » X matrix (t‘,), and 
the quantity on the right is the sum of its diagonal elements, i.e, the trace of the 
matrix.) 

In this section we show how to extend the operation of contraction to other 
tensors. Consider now the space U?, defined in (5.7). We assume here that p > 0 


Contraction of tensors 589 


and g > 0. Fix two integers h, k, with 1 <A < pand1<k < gq, and consider 
the mapping 


a Ux...) xUxUxX. x Urs UF 
P 
defined by 
cA (Xt YE YS 
(yh, mom @ s+ hess O@xy Sy @--- Hf --- OD 


where the symbols &, and yf indicate that those factors are to be omitted, Thus 
the x part of the tensor product written above stands for 


Oe Omi SMW ++ OY 


and similarly for the y* part. One verifies without difficulty that the mapping 
(7.7) so defined is multilinear. Therefore, by Theorem 3.3, there is a unique 
linear mapping 


1 Cha UP, > UR 
such that 
710 Chin @ +++ Oxy @yt@--- @ ys) 
SE mm @ eRe Oxy Syl He. OY 


for any x, in U and yj in U*. In particular, for the base elements 


naa (=e, @- ++ Be, Per O--. Bev 


in U?, we have 


of elt = (el veld sip 

rat (eh) a ceeg eet 
git lt dhe de 
Be eS 


7a 


be any tensor in U?,, Applying C*. to t and using linearity and (7.12), we obtain 


A(t) a pec teat than + 
Chet) aah 


changing the names of the indices, we can rewrite this as 


efile eatin 
ra ON) Ee 


540 Tensors Ch. 16, Sec. 7 


Hence, the components of C’,(1) are obtained from those of t by summing over 
the kth upper index and the kth lower index. In particular, we have, for ¢ = 
fe; in UY, 


CQ = 
as in (7.6). Another important example of this operation is as follows. Take 
ms) t=tes int and) ox=z'e, inl 
Apply the contraction operation C?, to the element 
t@x=lete, in Uy 
The result is 
CAL@X =tize, inUy =U 
Therefore the operation 
116 x > C4(t @ x) 


is the linear mapping U — U whose matrix relative to the base B is (f',), and in 
this way one can regard the tensors in U!, as endomorphisms U — U. Similarly, 
taking y* = y.e! in U*, we have 


Chit @ y*) = 


“ye? 
and consequently 
mar oy* = Ch(t ® y*) 


is the linear mapping U* — U* whose matrix is the transpose of (f';). In fact, 
the mapping (7.17) is the transpose of the mapping (7.16). 


The contraction operations C“, can be applied in a wide variety of ways, clearly. 


In particular, they can be repeated. Consider an element u = ujies; of Us. 


We have 
Ch(u) = aie, 


an element of U';, Applying C'; again (it affects only the two free indices j, h) 
we get 


Che Cha) = wi 

This is easily seen to be the same as C', ° C,(u). Similarly, 
Ch(u) = wget; 

and so 


Che Chiu) = aft 


Contraction of tensors SAL 


Similar considerations apply to any U',, Thus, let 
ExT uae 


be any tensor in U’, The contraction C', results in summation over the first 
upper and lower indices, so that C'\(u) has components 


fie ie mes te 
id Mia i Min oe 


C} applied to this causes summation over the first upper and lower indices, and 
C'(C1,(u)) has components 


nibs ip aris i 


729 iain. «ie A ir 


Continuing in this way, one finds at once that application r times of C’, to vw 
yields the scalar 
en Cuero nnn Chea} rere. 


The mapping uw! is a linear mapping U’, — K, since each C' is linear. 


Now take a tensor t = t’ & es, 1, in U’) and a tensor t’ = 2), i 
et---% in U®. Their product t @ t' is in Ut, and its components are ti - * 
th j~ Then (7.21) applied to t ® t’ yields a scalar which we shall denote by 
«’, t). Thus, in terms of the components, 


a (ty at 
Fort =m @+++ @x,andt' = yi ®--- ® yf, this reduces to 
was (9 @ + Sym @- +> Ox) = (yh a) ++ Cw) 


Referring to (7.22), the mapping (t, t’) +t ® ¢’ is bilinear; the mapping (7.21) is 
linear; hence 

7.24 (t,t)  ¢t’, ty 

is a bilinear mapping U'o X U®, + K. For r = 1 it is the same as (7.1). For 
fixed t’, the mapping t — (1’, t) is a linear mapping U’) — K, hence is an element 
of the dual space of U). In this way we can regard U®, as the dual space of U's, 


and vice versa. 
Let us make the following definition: 


DEFINITION 7.1 An 7-linear function f ov a vector space U over K is an r-linear 
mapping 


542 Tensors Ch. 16, Sec. 7 


The latter notion is defined in Definition 2.2. 
As a final observation here, let ’ be a fixed element in U",. Then the entity t” 
defined by 


ns OK) 


XO +++ Ox) 
is clearly an r-linear function on U. 


THEOREM 7.1 The correspondence t’ — t’' defined by (7.25) is an isomorphism from 
0°, to the vector space of r-linear functions on U compatible with the operation of 
GL(U) on each. 


Proof. Let t”’ be an r-linear function on U, and define t’ by 
tate, oe een ce 


Then ¢’ is an element of U°,, and it is easily seen that t’ does not depend 
on the particular choice of base B = {e, .. ., en}. Moreover, (7.25) 
holds for this t'. Therefore the mapping t’ — ¢”’ is a one-to-one mapping 
of U°, to the space of r-linear functions, and it is trivial to verify that it is 
a linear mapping compatible with the operation of GL(U). @.E.D. 


This theorem gives a rather concrete meaning to U®,, showing that its elements 
can be identified with r-linear functions on U. In particular, the elements of 0, 
can be regarded as bilinear functions on U. In a similar way one shows that ele- 
ments of Uy can be considered as r-linear functions on U*. 


Remark. Tensors which arise in physics often occur as multilinear functions 
on vectors. Most tensors in physical applications are so-called dyadic tensors, 
that is, elements of U;?, U1, or U*, in our terminology. We recall that elements 
of U'; can be interpreted as linear transformations U — U, as in (7.16). We note 
that elements of U*, can be thought of as linear mappings U — U*. Namely, 
for tin U*, the mappings 


x3 CL(t ® x) 
and 

x > Ch(t ® x) 
are both linear mappings U — U*. Similarly, elements of U% can be interpreted 
(in two ways) as linear mappings U* > U. 


EXERCISES 


1. Let f: U + V be a linear mapping of finite-dimensional vector spaces over 
K. Let f and f be the mappings of Theorem 5.2. Prove that f, restricted to 
elements of V°,, is the transpose of fy restricted to elements of U%. 

2. Show that elements of U', can be interpreted as linear mappings U% — U and 
as linear mappings U* > U%. 


Symmetry properties 543 


3. Given a linear mapping U?, > U' 
tensor in T(U). 

4. Let f be an r-linear function on U, and let g be an s-linear function on U. 
Let h be the (r + s)-linear function defined by A(x1, .. «4. Xr48) = fo. 5 
Kr) (Kru, + + + » Xe). Let u, v, w be the tensors in 1, U°,, U2,, corresponding 
to f, g, h, respectively, according to Theorem 7.1. Prove that w =u @ v. 

5. Prove that the mapping (7.8) is compatible with the operations of GL(U) on 
ue, and UP}. 


, show how it can be represented by a 


8. Symmetry properties 


Again we let U denote an n-dimensional vector space over a field K. However, 
we now impose a condition on K: we assume that no multiple of the unit element 
1 in K is equal to zero. That is, we assume that the elements 1+ 1,1 +141, 
etc., in K are nonzero, In particular we exclude finite fields. The fields Q, R, C 
certainly fulfill the requirement, and in later sections we shall be concerned solely 
with vector spaces over the real field R. The reason for our present restriction will 
soon be apparent. 
As usual we denote by U?, the space defined in (5.7), In particular, 


a1 UP, = 


UX+++xXUUY% 
——— 
defined by 
82 (Xe ee Xp) Or @- + OH @--- OX, S++: Ox, 


(interchanging x, and x,) is certainly multilinear. Consequently (Theorem 3.3) 
there is a unique linear mapping $,.: U?) > U?psuch that 


B38 Sra(Xi @ +s ORs - Ou O-+- @xX,) 
HB) OUD ++ OHO +s Oxy 
and it is easily seen that S,,. isan isomorphism. In particular, if B = fe, ... , e,) 


is a base for U, and if as usual we put 


ba ey. =e, @-s + Bey, 


85 Snal@i os ip) = Oo tg thes ip 


544 Tensors Ch. 16, See. 8 


A tensor t in U?, is said to be symmetric in the h and & places, or in the h and k 
indices, if 


Be Sax(t) = t 
and tis said to be skew-symmetric, or alternating, in the h and k places, or indices, if 
a7 Sra(t) = -t 


These conditions are easily transcribed for the components of t relative to the 
hase B as follows: Let 
ae tate beng 
From (8.5) and the linearity of 3,. we have 

Shalt) = 00 80a te 


From this and (8.8) it is immediate that (8.6) is equivalent to 


a9 Ch cc a A 

for all sets of indices i, = 1,2, ..., 2”. Similarly, (8.7) is equivalent to 

ato th fy 

More generally, if ¢ denotes any permutation of the integers 1, . . . , p, then there 


ig a unique isomorphism S,: U%) — U% such that 
Pet Som @ - + + OX) =X O + - + @ Xe 


for any x, in U, the argument being the same as with S,,, above. A tensor t in 
Ut, is said to be symmetric with respect to a if 


8.12 S(t) =t 

The equivalent condition on the components of t is easily found to be 

813 Li +s ip = fiw + Heep 

Similarly, t is said to be alternating, or skew-symmetric, with respect to o if 


ais S(t) = —t and o odd 


For the components of t in this case one only has to put a minus sign on one 
side of (8.13). 
Now any permutation ¢ of {1, .. . , pj] can be expressed as a product 


oS 02 + + Om 


of transpositions (Theorem 3.4, Chap. 10). For the corresponding mappings 
Ses Seip - + 1 Sey, Of U% to itself one easily finds that 


45 So = Sa + + + Soy 


Symmetry properties 545 


and therefore problems of symmetry can be reduced to questions involving map- 
pings of the type S,,. considered at the beginning of this section. 

A tensor t in U* is called symmetric if (8.12) holds for every permutation ¢ 0 
{1,...., p}. From the remarks just made it is clear that t is symmetric if and 
only if (8.12) holds for every transposition « of two elements of {1, ..., pt. 

Given any tensor ¢ in U* it is easy to construct from it a new tensor having 
given symmetry properties. For example, suppose we want to obtain from t a 
symmetric tensor, as just defined. Form the sum 


8.16 Si) = SF Sh) 


where on the right ¢ runs through all permutations of (1, ..., p}. If «is any 
permutation of this set, then 


S2(8(0) = 3) Sa(Se(D} = FF See (t) 


Since the permutations of {1, ... , p} form a group, it follows that, as « runs 
through all its elements, so does c,7. Therefore the expression on the right above 
is the same as the right member of (8.16). That is, S,,(S(t)) = S(t), and so S(t) 
jg a-symmetric tensor. Now if t happened to be symmetric to begin with, then 
every term S,(t) in (8.16) is equal to t. There are p! terms, and so 


817 S(t) = pit if t is symmetric 


If some multiple of the unit 1 in K were zero, then (8.17) would be zero for large 
enough p. This unsatisfactory situation is precluded by our assumption concern- 
ing K. The element p! = p! X 1 in K is not zero, hence has an inverse. We 


1 . ween . 
can then form the operator = 8, and it has the property that it is a linear mapping 


U», + U% which carries every tensor t into a symmetric tensor, mapping sym- 
metric tensors into themselves. 


For the components of i S(t) it is easily seen from (8.16) that they are the 


elements (/) - -- © given by 


nas reddy 


MU os + feta 


For p = 2 this reduces to 
aap = We + ti 


A tensor t in U% is called skew-symmetric, or alternating, if (8.14) holds for every 
odd permutation ¢ of the set {1,..., pj. From the remarks above concerning 


546 Tensors Ch, 16, See. 8 


(8.15), and from Theorem 3.5, Chap. 10, it is clear that ¢ is alternating if and only 
if (8.7) holds for every pair of integers k, k with 1 <k <k <p. 

Just as with S above, we ean build an operator A which produces an alternating 
tensor from any tensor t in U?s. Namely, we put 


1 
a2 A(t) = sign (a) - S.(t) 

we 
the sum being over all permutations ¢ of {1,..., p}, with sign(c) = +1 or 


—1 according as ¢ is even or odd. If «’ is any transposition of two elements of 
{42.25 p), then 


1 : 
Se(A()) = at x sign (o) - S,(S,(t)) 
- 5 ST sign (a) » Sel) 


> sign (o’c) - See (t) 


since o’ is odd. As o runs through all permutations of {1,..., p}, 80 does 
a’g, and so there follows 


Se(A()) = —A(t) 


showing that A(¢) is an alternating tensor, If t itself is alternating, then from 
(8.14) it follows easily that sign(o) - S.(t) = t, whence 


8.21 A(t) =t if t is alternating 


As with (8.18) it is quickly seen that A() has as components the elements 
1 

pre pte te LDS gion egy. piety = ss ttn 
pt 3 ign (0) 

For p = 2 this reduces to 


ars = Lect — 88) 


Alternating tensors will play an important role later in this chapter. We observe 
that for the base vectors (8.4) the formula (8.20) gives 


1 . 
8.4 Ae, ;,) = 7 zx sign(a) + @igg, @ + + + ® Cee 


The components of es, ... a, relative to that same base are clearly the elements 
aa. +. 62, since 
a Ba 

e =i... dive 

an ap = On iy 


The metric AY 


Putting these in (8.22) we find that the components of A(es, . . . «,) are the elements 


125 gooey 


te §ielp 
al. ay ap 


by (2.7) of Chap. 11. 


It is clear that an entirely analogous discussion can be carried through for the 
covariant tensor space U°,, We shall not bother to write out the corresponding 
formulas, which are essentially the same as the foregoing, with upper and lower 
indices interchanged. Putting the results together with those above, one obtains 
similar results for mixed tensors in U?, = UP) @ U%. 


EXERCISES 
1. Exhibit a linear mapping Y: U? > U% such that Y(t) is skew-symmetric 
in the first three places, assuming p > 3, and such that Y(t) = t if t already has 
that property. 
2. Exhibit a linear mapping W: U'; > U's such that W(t) is skew-symmetric 
in its first two covariant places and such that W(t) = ¢ if t already has that 
property. 


3, Prove that the operators |, S and A defined above are idempotent. What 
? 


are their eigenvalues and eigenvectors? 

4, Prove that the skew-symmetric tensors in U*) form a subspace, Compute 
its dimension. Do the same for the symmetric tensors. 

5. The operator A can be represented as a tensor in U?,. Show how that can 


be done. Do the same for I Ss. 
pl 


9. The metric 


Let U be an n-dimensional euclidean vector space over the real field R. We recall 
from Sec, 11, Chap. 8, and Sec. 7, Chap. 14, that U must be equipped with a 
symmetric bilinear form, denoted here simply by parenthesis (, ), such that 
(x, x) > 0 unless x = 0, The length of a vector x is defined by 


94 Ix] = VGy x) 


From Theorem 7.1, the form ( , ) can be considered as an element of U°,, If By = 
{u, ... , un} is any base in U, then the matrix of the form relative to that base, 
call it (9:,), is given by 


9.2 


548 Tensors Ch. 16, See. 9 
The element 
a3 ge =giu ow 


in U* is precisely the one that can be identified with the given form according to 
Theorem 7.1, since the contraction 


Ch Cl(g* @ x ® y) = giz'y’ (x = v'wy y = ya) 


is precisely the inner product (x, y). We observe that g* is a symmetric tensor, 
as defined in the preceding section. A base B = fe, ..., ea} in U is called 
orthonormal if 


0 iting 
(eV) ing 


Such bases always exist, and (9.3) reduces to 
94 gi = le ge 
m= 


The bilinear form ( , ), or the equivalent tensor g, is sometimes called a metric on 

U, Our purpose here is to show that all the subspaces U?, of the tensor algebra 

inherit metrics from the given one on U, hence become euclidean vector spaces, 
First of all, consider the linear mapping L: U — U* defined by 


25 L(x) = (x, ) 


That is, L(x) denotes the element of U*+ that maps an arbitrary y into (x, y). 
In other words, 


96 (L(x), y) = (% y) 

For the base B, = {w, ... , un} we have 
(Lu), Wy) = (se, Wy) = gis 

whence 

9.7 La, = g.w 


since both sides are elements of U* which have the same effect on any vector in U. 
Since (g;,) is positive definite, its determinant is nonzero, and therefore L is an 
isomorphism. Consequently the inverse mapping L~' exists and is given by a 
certain matrix 


98 Loa) = gu, 
relative to the base B, and its dual. From (9.7), 


we = LOL) = Eg) = gL av) = gigs 


The metric 549 


showing that 


{1 
gh = Ob = 
99 Gd lo 


That is, (g,,) and (9°) are inverse matrices. Since a matrix and its inverse com- 
mute, (9.9) implies the equations 


sae gon = Fy 


We use the mapping L-! to define an inner product on U*, denoted also by 
parentheses, Thus, for two elements x* and y* of U* we set 


sar (x*, y*) = (Lxt, by") 


the right-hand side being the given inner product in U. It is obvious that (9.11) 
defines a symmetric bilinear function on U*, and it is positive definite, since the 
right-hand side of (9.11) is positive definite. We define the length of a vector x* 
in U* as usual by 


212 [xt] = VG, x*) = [LAGx*)| 


Thus we have simply defined the length in U* in such a way that L (and L-') are 
length-preserving mappings. 
To compute the inner product in U* we have, for the dual base elements u‘, 


(i, w) = (Lu, Low) = (gu, gu) 
g(r, We) = 99 one 


by (9.8) and (9.2). From (9.10), gg. = 4%, and so (u‘, uw) = g, Since (ui, w’} = 
(w, u'), we have g# = g¥, and so we can write 


oan (uy, w) = gif 
For vectors 
x* = xu! y* = yut 


in U* there results (x*, y*) = (var', yi, u’) = 2.y,(u', uw’), whence 


944 (x*, y") = gay 
Consider now the special case of an orthonormal base B = fer... , en], and 
as usual let B* = fe', .. . , e”} be the dual base in U*. For this base the ele- 
ments gi; of (9.2) become 
1 ities 
ei @;) = Oy = a 
(e9 @ = bu (0 iting 


Therefore (9.7) becomes 


L(e) = dye! 


550 Tensors Ch. 16, See, 9 
and similarly (9.8) reduces to 
L-(e') = e, 

Then (9.11) gives us 
248 (et, €) = (eis @)) = dey 
showing that the dual base B* is also an orthonormal base. 

We now use the inner products on U and U* to define an inner product on 
UP, for any p,q. Let w be an element of U%s,, and let t and t’ be elements of 
U>,, Then the tensor w ®t ®t’ isin UZt2, By performing a suitable series 


of 2(p + q) contractions, denoted simply by C, we shall obtain from w @t @ ta 
scalar C(w @ t @ t’), as in (7.21). The mapping 


9.16 (t,t) + Ciw @t ot) 


will then be @ bilinear mapping U?, x U?, > K. With a suitable choice of w 
and € we can hope to get in this way a symmetric, positive definite bilinear form 
on U?,, To carry this out we start with 


947 g=g%u; Ou; gt = 9; Ow 

These are symmetric elements of U%, and U, respectively, and they do not depend 
upon the particular choice of base B; = {u,, .. . , u,}, a8 is easily verified. We 
now set. 


9.18 


which is indeed an element in U,. We must now specify the contraction opera- 
tor C. The notation used in See. 7 is rather cumbersome, and C can be more 
simply described ag follows: The components of w with respect to the base By = 
{m,...,u,} are 


Ge ge Okeke 


with suitable choice of indices. We choose C to be the contraction that gives 


sag COW BL OU) = gh. geiegng, + gag ee 
where {!' "’ ° *? are the components of ¢ relative to the base B, and similarly for t’, 


The inner product in U?, is then defined to be 
9.20 (0) = Clw @t@ et) 


From (9,19) it is easily seen to be symmetric. For an orthonormal base B = 
fe, ..., en} in U, (9.19) reduces to 


aa (ht) = are 


ig 


The exterior algebra 551 


the sum being over all sets of indices i, &, = 1,...,m. In particular, (9.21) 
gives 


9.22 (tt) => (« wy 


showing that (t, t) > 0 unless t = 0. Thus the bilinear function (9.20) gives U?, 
the structure of a euclidean vector space. We define the magnitude, or length, of 
an element t by 


9.23 It] = WO 
Furthermore, the elements 


9.28 e =e, @- ++ Ga, Ger ®+ ++ Sev 


form an orthonormal base in U?,, if the e; form an orthonormal base in U. This 
follows at once from (9.21) since the components of (9.23) relative to that base are 


. pa . 
Ba Papi + Bie 

Finally, if (=x @ +++ @x,@yf @ ++» @ yt, then (9.19) gives 

9.28 (tt) = (mt a1) + + (Xp ae)(YT, YT) + + (Ye Ye) 


and a similar expression holds for the inner product (t, t’) of two tensors of this 
type. 


EXERCISES: 

1. Describe the contraction operation C used above in the notation of Sec. 7 
for the ease p = 2,q = 1. 

2, Prove in detail that the elements (9.23) form an orthonormal base in U?,. 

3. Show how to make T(U) into a euelidean veetor space. 

4, Let t be in U*,, and Jet t’ be in U,. Prove that |t ® t’| = |t| - |t’], and show 
that the definition of |t| given above is the only one for which this holds for all 
values of p, 9, 1, 8 

5. Prove: If one considers the operations on only the unitary group on a eu- 
clidean vector space U, then U?,, US,,, and U?** are all canonically isomorphic. 
(Hint: Use the canonical isomorphism U — U* of the euelidean apace U.] 


10. The exterior algebra 


Starting with a vector space U over a field K, we now describe an important 
algebraic system called the exterior algebra of U. 

As usual let To(U) denote the contravariant tensor algebra of U. Thus Ty(U) 
is the direct sum of the subspaces 


104 Uy =U®--. @U (p factors) 


552 Tensors Ch. 16, See. 16 


and T,(U) has the product operation @. Let S denote the ideal of Ty(U/) generated 
by all elements of the type 


102 x @x (xin U) 


That is, S consists of all elements in To(U) which can be obtained from elements 
of the type (10.2) by a finite number of the three operations in To(U), that is, 
addition, scalar multiplication, and products by arbitrary elements in T,(L/). 

It is clear first of all that the sum and difference of any two elements of S are 
again in S. Hence S is a subgroup of T,(U), regarded simply as an abelian group, 
and we can therefore form the quotient group T)(U)/S, consisting of all the 
cosets of S (see See. 4, Chap. 10). Every coset of S can be written (in many ways) 
in the form t +S, t being any element of T,(U) in the coset. We recall that 
addition in T)(U)/S is given by 


10,3 (4 +8) 4+ (+S) =(h +h) +8 


We make T,(U)/S into an algebra by defining two other operations. First, if 
c is a scalar, we set 


FOr e-(¢+8) =et+S 


Ift'+S =t+5, then t’ —tisin S. From the definition of S it is clear that 
c(t’ — 6) is also in S, whence ct’ + S = ct + S, showing that (10.4) depends only 
on the coset, not upon the particular element t in it. Operations (10.3) and (10.4) 
make T,(U)/S into a vector space over K. 

Finally we define a product operation A in T)(U)/S by the rule 


10.5 (tr + S)A (bh +58) = (4 @t) +5 


The right-hand side depends only on the cosets. For if ¢{ is in t + 5 and if t2 is 
in & + 8, then both ¢ — t and G — t; are in S. From the definition of 8 it is 
clear that the products (t] — t:) ® t) and t, @ (t} — t,) must also be in S. Hence 
so is their sum t{ ® t; — t; @ ty, and therefore (t] @ ts) + S = (t @t) +8. 

It follows at once from (10.5) that this new product operation, called exterior 
multiplication, is associative. 

The quotient algebrat To(U)/S just defined is called the exterior algebra of U. 
We shall denote it by AU. It has a very simple structure, which we now ex- 
amine. 

Let P: To(U) > AU he the canonical mapping, sending each element t of To(U) 
into the coset containing it: 
ws =| P(t) = t+ S 


From (10.3) and (10.4) it follows immediately that P is a linear mapping with 
kernel $; (10.5) is simply the equation 


10.7 P(t: @ te) = P(t) A P(t) 


+ This construction is described in detail in Chap. 15. 


The exterior algebra 558 


showing that P is compatible with the product operations in Ty(U} and AU. 
Hence, P is a homomorphism of K-algebras. 

P maps the subspace U*% of T,(U) onto a certain subspace of AU. We denote 
that subspace by A?U. Thus 


10.8 APU = P(UQ") 


From its definition it is clear that S = Ker P contains no elements of U% = K or 
of Ul) = U, and consequently P maps K isomorphically onto A°U and it maps 
U isomorphically onto A'U. For this reason we shall simply identify A°U with K 
and A'U with U. Thus we have 


10.9 Pe} =¢ for ¢ in K; P(x) =x for x in 7 

Now U% is spanned by elements of the type x1 ® --- @ x, (x; in U). From 
(10.7), P(x, ®@ +++ @ x») = Pin) A--- A P(x,), which can be written as 
XA +++ AX, by (10.9). Since P maps U% onto A’U, we conclude that 


10.10 APU is spanned by elements of the type 
A+ + Ax,  withainU (p> 0) 


Elements of A’U are said to have degree p. 

From (10.2) it is clear that x @ x is in S for any veetor x in U. Hence 
P(& x) = 0, or P(x) A P(x) = 0, by (10.7). Writing simply x for P(x), as in 
(10.9), we obtain the rule 


wat = kAX =0 © forallxinU 


The idea behind the definition of AU begins to emerge at this point. The ideal S 
contains all elements of T)(U) which have any sort of symmetry. Such elements 
go into zero in the quotient space T)(U)/S. Equation (10.11) is a particular 
instance of this. 


The mapping UX +++ XU SAU 
10.12 — p ~~ 


defined by (Ky 5. 5 Xp) 2 XA + + + A Xp is pelinear. 


This follows from the fact that the mapping in question is the composition of 
the multilinear mapping (x, . . . , Xp) 2%: @ -- + @x,and of the linear map- 
ping P. 

If x and y are vectors in U, then 


(C+ YAR ty) = XAX+KAY+yAX+yAY by (10.12) 


From (10.11), (x ty}a(x+y) = 0, xax = 0, yay = 0, and there follows 
KAy+yaAx =O, 0r 


10.13 KAY = —-yax (x, yin U) 


554 Tensors Ch. 16, Sec. 10 


From these there results at once 


a flere . 
FUT mAs ss Am, = sign ({ Py maces ax (x; in U) 
poe dp 
showing that the expression x, A - +» AX, is skew-symmetric in its entries. We 


have further 
10.45 MA AX =O if any two of the x; are identical (x, in U) 


For suppose x; = x. By a suitable permutation we can put x; and x, next to 
each other. If x; = x, then x;Ax: = 0, by (10.11). The permutation can at 
most change the sign, by (10.14), from which the assertion follows. 

Applying (10.18) repeatedly, or else using (10.14), we have 


wae  XLA AKA SAT Ag = (HDA AYG AMAT ANS 


for any x; y;in U. From (10.10), any element u in A?U can be written as a linear 
combination of elements x, A - - - AX,. Similarly, for any vin A‘U. Hence from 
(10.16) we have 


10.17 BAY =(-1)?4vAuU for uin ArU, vin AvU 


[Observe that (10.11) does not necessarily hold for an element x in A? U with 
p > 1.) The following fact is of basic importance: 


10.18 HAs + AX, =O if and only if m, .. .. Xp are linearly dependent 
(x; in U) 


First of all, if the vectors are linearly dependent, then one of them can be ex- 
pressed as a linear combination of the others, say, xp = cam: + +++ + c¢p-iXpae 


Then from (10.12) we obtain 
MpAcse TAXA (Crk Ho + + ep Xp) 


Dimas SAX IAK 


Each term on the right is zero, by (10.15). A similar argument holds in general, 
clearly. 

The converse proposition is not quite so easy. We shall prove it for the case of 
a finite-dimensional space U, dim U = x, say. Let us first look at the ingredients 
of the ideal S. By definition S consists of all finite sums of elements of the type 
t@y @y @t, with y in U and t, ¢ arbitrary elements of To(U). Since t and t’ 
can be expressed as sums of elements of the type w; @ -- + @ wi(we in U), it 
follows that every element of S can be expressed as 2 sum of elements of the type 


was WI Os OM SY SySuS--- Om 


with wi, y, and 2; all in U. 


The exterior algebra 555 


Let x, . . . , Xp be linearly independent elements of U. We can find elements 
Xpu ++ +» Xn in U such that x, .. . , x, forma base for U. Ifa ++ > Ax» 
= 0, then from (10.14) and (10.15) we have 


XAT AM, = 0 
for any indices, = 1, ...,2. The left member is equal to P(x, @ «+» @ Xi,), 
and thus x;, @ + - - © x, must be in the kernel of P, that is, in the ideal S, But 
the n" elements x, @ -- - @x,, span U%, and so every element of U”, is in S. 


We shall show that this is impossible. 

If tis an element of U%, and if 1 is in S, then t can be written as a sum of ele- 
ments of the type (10.19). Since To(U) is a direct sum of U%, U's, ete,, it follows 
that, in the resulting expression for t, all terms (10.19) which do not have exactly 
2 factors must add up to zero, hence ean be omitted, Therefore t ean be expressed 


as a sum of terms (10.19), withk +k +2 =n. Since x, ..., x, span U, we 
can write 
WwW, = aii y = Ox; Be = CK; 


Putting this in (10.19), we obtain 


aj» afebinbisigiees . . sofa x, OO x, 
If we put 
10200 Ui =X Os OX @ Maw @ > -- OX, 
+X B+ Okie @ Xin O + - Ok, 


then the expression above can be written as 


ait =. + aibbieebinstciens . . s cimas iy 
JenS jet 

Hence, if every clement of U% is in S, then U% is spanned by elements of the 

type (10.20) with jrg1 < jugs, The number of such elements is manifestly less 


than 2” = dim U%, a contradiction. Therefore we must have mA +> - AX, 4 0. 
But then . 

(MA AK) A (Kat A AR) #0 
from which mA +--+ AX, #0. QED, 


From (10.18) we have the immediate corollary 


10.28 APU =0) ifp>n=dimU 


For APU is spanned by elements of the type x14 - - + A Xp (x: in U), and every 
such element is zero if p > m, by (10.18). 

It follows easily that AU is the direct sum of thesubspaces A"U (p = 0, . . . 2). 
That is, 


w2 AU=NUGNU®..- ONU 


556 Tensors Ch. 16, See. 10 


We now compute the dimensions of the APU. Let B= fe, ..., e} bea 
base for U. If x: = rie, (i =1,... 4p), then 
MA ++ + AX, = (Hie, A ++ © A (a}Pey,) 
aries aime ass ney 


by (10.12). From this and (10.10) it is clear that A”U is spanned by the elements 
eA +++ Ae), But from (10.14) and (10.15) we see at once that APU is in fact 
spanned by the elements e, A - ++ Ae, with 1 <ji<-++ <j) <n. There 


are ¢) choices of distinet indices, . . . ,j,from 1 ton, and they can be arranged 


ee 2 
uniquely in inereasing order. Hence, there are ¢) elements @, A» ++ Ae), such 


that l <j; < +++ <j, <1. These elements are linearly independent in APU. 
For suppose that 


10.23 > ch ssi ey A+++ Ae, = 0 


for some scalars ci---%, Let koi, -- - , kx be distinct integers from 1 to n, 
and form the exterior product of the left member above with the element e,., A 
+ Ae. All terms 


AEA HAE AC A AC Gr << 4p) 
vanish except the one for which jj, . . . , jp ig the complementary set of indices 
In, . . » , kp Corresponding to kpsi, - - - key 80 that ky, . . . , k, isa permutation 
of 1,...,%. Thus the product of the left side of (10.23) and eyiA ++ + Aen 
reduces to the single term 

ches Beg Ae A@, (no summation) 
and this must be zero, by (10.23). But e,,A +--+ Aes, is not zero, by (10.18), 
showing that cft---*» = 0. We have therefore proved that 

, n 

wu dim APU = ¢) (p=01,...,2) 
and that. 
a2s the elementse, A+++ Ae), with <j < +++ <jp <n forma base in 

APU if fer, . . . 5 en} ts a base in U (for p > 0) 


From (10.22), (10.24) 


1.26 dimau =(P) 4 (Heo. + ("=a tty = 2 


Hence we have exhibited the structure of the exterior algebra AU. 


The exterior algebra 557 


yweonEm 301 Leff: 7 X ++ X U— W be a p-linear mapping such that f(x, 
0 whenever two adjacent x’s are equal. Then there exists a unique linear 
> APU W such that fxn... Xp) =P OLA ++ AX,) for any 


Proof. By Theorem 3.3 there is a linear mapping f": U% > W such 
that f(a, ..., Xp) =f" (x1 @- ++ @x,). If xu @-+- @x, is an 
element of the type (10.19), then f(x: @ - + - ® x,) = 0, by the assump- 
tion on f. By linearity it follows that every element of S which is in Liry 
is mapped into zero by f’. Therefore, if t and t’ are elements of U*) such 
that t ~ t’ is in S, then £"(t) = £"(t’). It follows that f maps every 
element of U7", in a coset of S into the same element of W. Then, for any 
element u of A*U/ we simply define f’(u) = f(t), where ¢ is any element 
of U* in the coset u. It is quickly verified that f hag the desired prop- 
erties. Q.E.D. 


The following theorem is of some importance. 


THEOREM 10.2 Let U and V be finite-dimensional vector spaces over a field K, and 
let f: U > V be a linear mapping. Then there is a unique homomorphism f: AU > 
AV of the exterior algebras suck that I(x) = f(x) for any x in U. ft maps ArU to 
APV for all p. 

Proof. fis required to be a homomorphism of algebras and must there- 
fore be compatible with the three operations in AU and AV, namely 
addition, scalar multiplication, and exterior product. In particular, ft 
must be a linear mapping. 

To show that f exists, let fo: To(U} + To(V) be the homomorphism of 
the tensor algebras described in Theorem 5.2. As above, let P: To(U) > 
AU be the canonical mapping, and let P’: Ty(V) > AV be the similar 
mapping for V. Put S = Ker P and S’ = Ker P’. As we have seen, 
every element of S can be written as a sum of terms of the form (10.19). 
Applying f, to (10.19), we obtain the element 


w0.27 fii) B+ + - @ E(w) @ fly) @ fly) @ flu) @ + - @ Kz) 


since fy is compatible with @ and since fo(x) = f(x) for any xin U. The 
element f(y) is in V, of course, and therefore (10.27) is an element of the 
ideal S’, by definition of AV = Ty(V)/S'. 

Therefore f) maps S to S’. If t + S is a coset of S, that is, an element 
of AU, we define 


12s f(t + S) = f(t) + 8’ 


itt+S=64S, then t —t is in S, and so h(t — 4) = h(t) — f(t 
isin S’. Therefore fo(t) + S’ = f(t) + 8’, showing that (10.28) depends 
only on the coset and not upon the particular representative t. Hence 


558 Tensors Ch. 16, See, 10 


(10.28) defines a mapping f from AU to AV.t Since P(t) = t + S and 
P'(fo(t)) = fo(t) + S’, Eq. (10.28) can be written 
azo (P(t) = P*(fo(t)) 
It is a straightforward matter to check that Phas the required properties. 


For example, if u and v are elements of AU, say, u = t; + S and vy = 
te + S, then u = P(t) and vy = P(t). Hence, 


uAv = P(t @ ty) 
by (10.7). By (10.29) we have 


fu ay) = (PtH ® t)) 
= P'(fo(t: @ &)) 
P'(fo(tr) @ fo(te)) 
P'(fa(ta)) A P’(folte)) 

= ftw) a fay 
The linearity of f follows in 2 similar way. For xin U we have P(x) = x, 
by (10.9), and (10.29) gives us f(x) = P’(f(x)) = f(x), since fo(x) = f(x) 
and since P’(y) = y for any element in V. The uniqueness of f is easily 
demonstrated. For linearity implies that fis uniquely determined by its 
effect on a base in AU. Such a base is exhibited in (10.25) (the base 
element in AU = K is simply the unit 1). Now 


wae fea ++ Ae;,) = Kes)a--- alles) 
=fleyha ++ - afle;,) 


since f is compatible with A and since f(x) = f(x} for xin U. Therefore 
the left member of (10.30) is completely determined by f, and consequently 
fis determined by f. Q.6.D. 


coroucary Let f: U — V and g: V — W be linear mappings of finite-dimensional 
vector spaces over K. Seth = gof. Then for the mappings of Theorem 10.1 we 
havehh = got 
Proof. It is easily verified that g° f is a homomorphism from AU to 
AW and that h(x) = ge &(x) for xin U. Therefore h = ge ¢, by unique- 
ness. @.E.D. 


The theorem has an important application. Namely, let f: U — U bea linear 
mapping of an n-dimensional vector space over K. Then, by the theorem, f 
determines an endomorphism ¢ of AU, and f maps the one-dimensional vector 
space A"U to itself. But an endomorphism of a one-dimensional vector space is 
completely determined by a single scalar. Thus, if u is any nonzero element of 
A‘U, then fu) must be a multiple of u, say 


aost — (m) = cu (ein K) 


+ See Exercise 2, Sec. 4, Chap. 10. f is simply the induced mapping of the quotients. 


‘The exterior algebra 559 


since f(u) is in the space spanned by u. The scalar ¢ does not depend upon the 
particular choice of u # 0 in AU. 


DEFINITION 10,1 The scalar c of (10.31) determined by the mapping f of U ts called 
the determinant of f, denoted by det f. 

Tt is easy to show that this definition coincides with our earlier definition 
(Definition 5.1, Chap. 11). To do so we show how to compute f in terms of bases. 
Let B = |e... a] be a base in U, and let f be given by 


10,32 f(e:) = ce; 


Then 
flea +++ Ae) = fle) a+-+ + Afle:,) 
10,33 = (chen) A+++ a (ci2e;,) 
RCL Cee A AG, 
by (10.30). If any of the indices i, ... , 1, are equal, then both sides must. be 
zero. We therefore assume that 1 <i,< +--+ <i, <n, The sum on the right 
runs over all p-tuples j1, ... , jp from 1 tom. Since e, A+++ Ae, = Oif two 
indices are repeated, we can restrict the summation to p-tuples of distinet indices. 
Now let 1 $k) < +++ <ky <n bea p-tuple arranged in increasing order. If 
jy.» + sJdp is a permutation of it, then 
ton (4 
ey Ar Ae, = sign (4: Been HTN ey 


where no summation is understood on the right. Collecting together all terms in 
(10.38) which are permutations of ki, .. . , kp, one easily sees that they add up to 
Zsign (20 p)-ch s+ eg-eya +++ Aes (no summetion on 
ky. kp) 


the sum being over all permutations of ky, . . - , kp. From (2.7), Chap. 11, the 
total coefficient of e,, A Ae,, is the quantity 


ch oh 


ctp ss che 
Hence, for the base (10.25), we have 


mas tleyA sss Ae,) = > th eA Ae, 
nc chy 


The quantities (10.34) therefore give the ‘‘matrix” of f on A’U relative to the base 
consisting of the elements eA --- Ae, With < +--+ <i, 


560 Tensors Ch. 16, Sec. 10 


For 7 = n, (10.35) becomes 


ose fl@kA + ++ Ae) = ey! QA Ae, 


and by (10.34), ci" = det (ci). Hence the determinant of f in Definition 10.1 
is the same as that defined in Chap. 11. 


Observe that for the special case of the identity mapping of U, (10.82) becomes 
{(e;) = 8,e;, and the corresponding quantities (10.84) are 


war ‘ 
nee TP ie 
i a 
We have 
1 if fa, . . . , kp is an even permutation 
| of ty. sty 
wu = 4-1 ith, k, is an odd permutation 
| of, ey ips 
0 otherwise. 


These are the generalized Kronecker symbols. 


Using Definition 10.1 it is possible to write out a theory of determinants without 
writing any matrices. As a simple example, it follows at once from the corollary 
of Theorem 10.2 that 


10.39 det (gef) =detg-detf 


for two endomorphisms f and g of U. Hence we have the product rule for deter- 
minants. One easily coneludes that f: U — U is an isomorphism if and only if 
det f ¥ 0. 


EXERCISES 

1. Let B =1e,...,¢,} and B’ = {u,... , un} be two bases in U. Work 
out the connection between the corresponding bases (10.25) in A7?U. What is the 
result if B’ is obtained from B by simply permuting the e;? 

2. Let f: U > V be a linear mapping of finite-dimensional vector spaces, and 
suppose that f maps U isomoarphically onto a subspace of V. Prove that the 
homomorphism f of Theorem 10.1 maps AU isomorphically onto a subalgebra 
of AY. 

3. Let x be an element of an n-dimensional vector space U. Compute the 
dimension of the kernel of the mapping APU — Av+'U defined by u > x Au. 

4. Let u be an element of A’U, and let v be an element of AvU, U being an 
n-dimensional vector space. Show how to compute the components of ua v from 
the components of u and ¥y relative to some base in U. 


Plticker coordinates; duality 561 


*5, Let the vector space U be the direct sum V © W of two finite-dimensional 

subspaces. Show that ArU is isomorphic in a natural way to the direct sum 
(APV @ ASW) & (APV BAW) @ ++ - S(AV @ AW) 
If f is an endomorphism of U for which V and W are both f-atable, show that 
det f = det f’- det f”, using Definition 10.1, where f’ is the restriction of f to V 
and where f” is the restriction of f to W. 
6. Let f: U — U be an endomorphism of the linear space U. Let V be a sub- 

spacewith basex, . . . , Xa. Provethatf sends Vinto Vif and onlyifma ++ - Ax, 
is an eigenvector of the induced mapping f of f on the exterior algebra. 


11. Plicker coordinates; duality 


Let V be a subspace of an n-dimensional vector space U over a field K. The 
mapping f: V — U defined by f(x) = x (x in V) is a linear mapping, and by 
Theorem 10.1 there is associated with it a homomorphism f: AV AU. If 
dim V = p, then AV is a one-dimensional space, and it is mapped by f onto a 


one-dimensional subspace Ly of A7U. Thus, if {v, . . . . vp] is any base for V, 
then APV is spanned by the element v, A + + + A v,, and f maps this element into 
ma fA Av) =P) As AFG) SA AY, 


in A’U. Strictly speaking, we should use different symbols for the exterior 


product in AV and AU, since the symbol ¥, 4 > - - Av, has two different mean- 
ings here—on the left side of (11.1) it designates an element of A*V, on the right 
it designates an element of APU. Inany case, vy, . . . , ¥pare linearly independent 


elements of U, and therefore the right-hand side of (11.1) is a nonzero element in 
APU, by (10.18). It spans a one-dimensional subspace which depends only on V, 
and we have called that space Ly. 

Now let B = fe, ..., en} bea base in U. Ly is spanned by any nonzero 
element in it, for example v, A + + + A¥,, and by (10.25) any such element t can 
be written uniquely in the form 


11.2 t= SL trea s-- ae, 
Any two nonzero elements t and t’ in Ly are multiples of each other: t’ = et and 
t=c"'t’”, Hence, if 


t= Sth eas ney 


ic ip 
then 

a3 ie Serf 

DEFINITION 11.1 The scalars ~~ ‘> of (11.2) are called Pliicker coordinates of 


the subspace V (relative to the base B in U)}. 


562 Tensors Ch. 16, See. it 


Equation (11.3) shows that any two sets of Pliicker coordinates of V differ by 
ascalar factor. Hence the ratios of the ¢''- -- ‘are the same as the corresponding 
ratios of the ¢"-- 1», ‘The ratios are therefore uniquely determined by V. Some- 
times the ratios of the ¢ ‘>, rather than f’s themselves, are called the Pliicker 
coordinates of V. 

Consider the vector space K,4, of (r + 1)-tuples x = (x, ... , 2), x in K, 

Let us call two such vectors x and y equivalent if they are both nonzero and if 
x = ey for some scalar ¢ ~ 0, This relation of equivalence splits the nonzero 
vectors of K,,: into equivalence claases, and clearly each equivalence class con- 
sists of all nonzero elements in a one-dimensional subspace of K,4;. Thus the 
equivalence classes are in one-to-one correspondence with the lines through the 
origin in Key. 
DEFINITION 11.2 The set of all equivalence classes of nonzero vectors in Ky41, as de- 
fined above, is called the projective space of dimension r over K, denoted by P,(K). 
If Q és any point (that is, element) of PAK), and if x = (x, « . « » 2) is any vector 
in the equivalence class of which Q consists, then the x; are called homogeneous 
coordinates of Q. 

if we take r = ¢) -—1 =dimA’U — 1, then we see that the Pliicker coordi- 
nates of V, enumerated in some fixed order, can be considered as the homogeneous 
coordinates of a point in P,(K). However, not every point in P,(K) represents a 
p-dimensional subspace of U. For every element of Ly is a multiple of va + - - 
Ay, Elements of this type are called simple, or decomposable, p-vectors. Hf t is 
in Ly, then 

tat=0 since (mAs + AV)A(WA S++ Avy) =O 
by (10.15), This imposes certain relations upon the Plicker coordinates, For 
example, take n = 4, p = 2, so that (11.2) becomes 


t= > tera e 
cra 


Then 


tat = (Br ere)aCy Mane 


IEF tesa 


> EU es AO; A en A ex 
1p ARR 


= 20784 — £8 + EMP ey A eo AO: A 


Since tat = 0 if ¢ is in Ly, we find that the Plicker coordinates must satisfy the 
equation 


wa 2p — ge 4 Byes = 0 


ifl+10in K, 


Plicker coordinates; duality 563 


In general, those points of P-(K) which correspond to p-dimensional subspaces 
of U must satisfy certain conditions which can be expressed as algebraic condi- 
tions of the type (11.4) on their homogeneous coordinates. The points of the 
projective space whieh do correspond to p-dimensional subspaces of U constitute 
a subset called the Grassmann variety of p-dimensional subspaces in U. 

Now consider the special case of an oriented euclidean vector space U of dimen- 
sion n over R, We recall (Sec. 18, Chap. 14) that the orientation simply divides 
the bases in U into positive and negative classes, two bases in the same class being 
related by a matrix with positive determinant. 

Denote by (x, y) the inner product of two vectors in U. That same operation 
makes every subspace V of U into a euclidean veetor space, of course, and V can 
be oriented in two ways (if dim V > 0). 


Let {@:, -.-, @n} bea positive orthonormal base in U, and let wy, ..., Us 
be arbitrary vectors, say, uy = clve;, Let f be the endomorphism of U such that 
fe) =u, =1,..., 7). We make the following definition: 
1s det (uw, ..., un} = det f 
By Theorem 10.2, Mera ++ > Ae) =Re)a+ >> afle) =mA- ++ Au. By 
Definition 10.1, fera --- Ae,) = (det f)-eA +--+ Ae, Hence 
1.6 mA -+- Au, = det (m,. 2. Un) Ars + Ae, 


By (10.33), (10.34), 


17 det (mw, ..., uw) = det (4) 
if {u,, ..., un} is also an orthonormal base, then ¢ = (e/,) is an orthogonal 
matrix, ‘ce = I, whence (dete)? = 1, or dete = +1. If fe, ..., en}, tun 


.. +, U,} are both positive orthonormal bases (or both negative), then det ¢ = 
+1, and (11.6) becomes 


WA ss AU, =e Ars + AC, 

We define 

ue Dears Ae, 

where {e, ..., €n} is any positive orthonormal base. The preceding equation 


shows that 2 does not depend upon the choice of base. Equation (11.6) can be 
written 


1.9 mass + Au, =det(m,..., uD 


It is clear that det (m, .. . , u,») does not depend on the positive orthonormal 
base involved in (11.5). From Sec. 6, Chap. 11, the number |det (uw, .. . , u)| 
can be interpreted as the volume of the parallelepiped determined in U by the 
vectors uy, ..., U,. It is easily verified that det(u, ..., u,) is an a-linear 
function. 


564 Tensors Ch, 16, See. 11 


Now let v1, .. 4 ¥p be linearly independent vectors in U. They span a p- 
dimensional subspace V, and the foregoing considerations can be applied to V, 
once an orientation has been chosen for V. The quantity 


1118 det (vin... 5 ¥p) 


is then defined by (11.5) applied to V. (We define this quantity to be zero if 
vi...» ¥p are linearly dependent.) Reversing the orientation in V changes 
the sign of (11.10). Thus the absolute value 


ua det (v1, . . 5 vp) 


depends only on v1, . . . , ¥p and not on the orientation of V. 
It is clear that 


aaaz det (Vy CV sp) Sr det (Vy vy Up 


However, the function det (vy, ..., vp) ig not multilinear, since it involves 
orienting p-dimensional subspaces of U, and there is no consistent way of orienting 
them all at once except for p = 0, p = x. 


If wA-+ ++ Avp=via+-- Avi, then det (¥,..., ¥)) =det (Wi, ..., 
vp). Forifva +++ Av, = 0, then the v,, hence also the vi, are linearly depend- 
ent, and the determinants are zero, by definition. If v,A «++ A vp ¥ 0, we have 
WAT AY,AY, = VLA +++ AZ AV} = 0, from which it followsthaty,..., 
Yp, Vj are linearly dependent, but vi, ..., ¥p are not, by (10.18). Hence vj is 
in the subspace spanned by vy, ..., Vp», We conclude at once that v,..., 
v,and vi, ..., ¥pspan precisely the same subspace V. (This amounts to saying 


that two different subspaces of U cannot have the same Plitcker coordinates.) 
An orientation of V being fixed, the equality of the two determinants above fol- 
lows from (11.9) applied to V rather than U. 

Let v be a decomposable element in A?U. That is, y can be expressed in at 
least one way as a product v = v,.A-- + Av, of vectors in U. If v 4 0, then 
those vectors span a p-dimensional subspace V. As we have just seen, if also 
vovia-+-+- Avi, then vi, ..., ¥)span the same subspace, and det (vs, ... 
v,) = det (vi, ..., ¥). This number therefore depends only on v (and the 
orientation of V). Accordingly we define 


1.13 det ¥ = det (vw, ~~. ¥p) ify =WA+ ts AV, 


The absolute value |det v| depends only on the element v. 

Let ¥v=vA+-- Av, and w= mA --- Aw, be nonzero elements of A°U 
and A°U. and suppose that the p-dimensional subspace V determined by v is 
orthogonal to the q-dimensional subspace W determined by w. We say then that 
v, W are orthogonal. Let {e, ..., €p} be an orthonormal base in V, and let 
fepis - - «> @peg} be an orthonormal base in W. Then fe, ... 5 @pye} is an 


Plucker coordinates; duality 565 


orthonormal base in the (p + q)-dimensional subspace Y determined by vA w. 
By (11.6), (11.13) we have 


v= + |detvl-e.A-- + Ae, inV 
w= + |det wl-epi1A - + A@piy inW 
vAw = + |det (vAw)|-e:A > Aerie in Y 
Hence 
tua det vj [det w| = det |vaw| if v, w are orthogonal 


Now let ¥ be a nonzero decomposable element in APU. As we have just seen, 
y determines a unique p-dimensional subspace V in U. Let V+ denote its orthogonal 
complement, consisting of all vectors x in U such that (x, y) = 0 for every y in 
Vv. V* is an (n — p)-dimensional subspace, hence corresponds to a one-dimen- 
sional subspace Ly. in Av-*U. Tf [wi .. ., Wey] is a base in V4, then Ly. 
is spanned by the element w = w,A +++ AW,-» We now choose the base to 
satisfy the condition 


det w| = Idet v| 


11.35 
detvaw > 0 


The second condition here merely requires that, if v = vA +++ Av, then {v, 

+1 Yor Wiy + + + + Wr—p} Must be a positive base in U. Starting with any base 
{wi . ., Weep] one obtains a base meeting the conditions by simply replacing 
w, by a suitable multiple ew, as is easily seen (assuming n ~ p > 0). 

The base {wi,..., Wa-p} in V4 is not uniquely determined by these condi- 
tions, but the element w = wiA +++ A Wap» in A*-?U is uniquely determined. 
For if twi, . . - , Whip} is another base in V+, then wand w’ = WiA ++ + A Wie 
are both in the same one-dimensional subspace L,., and so w’ = aw, for some 
number a. If w’ satisfies (11.15), then a = 1, clearly. The element w, uniquely 
determined by v, is called the dual of v and is denoted by *v. Equation (11.15) 
then reads 

|det (*v)| = |det v/ 


1416 det (va xv) > 0 


For v = 0 we define +v = 0, naturally. 


For any positive orthonormal base fe, ..., en} in U it follows at once that 
that #(e,A +++ Aep) = epyiA +++ Aen More generally, if e,A +++ Ae, 
= 0, that is, if the indices are distinct, then 
1.17 a(eyA ss > Ae) = be, yA 6 Ae, 
where ip, ..., d, are indices such that i, ..., i, is a permutation of 1, 


., ”, the sign + in (11.17) being the sign of that permutation. For example, 
ifn = 8, we have 


12.8 #(e1 A @2) = @5 #(@, A ey) = —ez #(@2 A @3) = @ 


566 Tensors Ch. 16, Sec. 11 


The operation * assigns to every nonzero decomposable element y in A?U a 
certain nonzero decomposable element «vy in A*-°U/. We naturally define +v = 
O if v = 0, and we now show that * can be uniquely extended, by the requirement 
of linearity, to a mapping APU — An-?U, 

We first observe that the mapping 


UX.) XU SAPO 
a 
defined by 
mas (Vi, Vp) SMA A Ny) 


is p-linear. We shall prove this for the first entry v. 
The equation g(cv, vz... ¥p) =¢-g(¥, .-., ¥p) is the same as 
11.20 ¥(OV, A Wr A 2 + + Ap) = + #(WA Ss + AD) 


Write ¥ = A++- Avy. The equation is trivial if ¢ = 0 or if v = 0, and we 
therefore assume that they are nonzero. We have (ev)Av2A +++ A¥y) => 
(WA +++ Avy) = ev, and det (cv) = -det v, by (11.12). Since v and ev deter- 
mine the same p-dimensional subspace of U, it follows that *v and #(cv) can differ 
only by a scalar factor. We want to show that #(ev) = ¢+(#v). Just as with 
cv, we have det c(xv) = ¢- det (*v). It is trivial to verify that ¢(*v) fulfills the 
requirements (11.16) for *(e¥). 

Next we must show that g(vi + vi, vo... 4 Ye) SQ, Yoo 2 Ye + 
g(vi, ve... . ¥p). That is, we must prove 


42.21 al(V, FVD A Vo A 6 A Np] = (VLA V2 A © 1 A ¥p) 
HE R(VLA VA © + AYp) 


The equation is trivial if wea +++ Avy = 0, and we therefore assume that 
veA +++ Av, #0. Let w,..., Up be an orthonormal base in the space 
spanned by vz, ..., ¥p Then weshall have 2A - ++ Avy =C+UsA +> + Ally 
for some number ¢ #0. Set uw; = v1 — (Gee +--+ + apuy), where a; = 
(v4, u;). That is, wis obtained by subtracting from v, its orthogonal projection 
on the subspace spanned by us, ... , Up, 80 that uw is orthogonal to wm, ..., 
u,. We have 


WAAAY = OM AU AS Alp 
= c+ (t + Gm. + +--+ + Gptly) AM2A + + AUp 
=c WAUgA +++ AUD 

Similarly, put uf = vi — (ajue + +++ + apu,), where aj = (vi, uj). Then 

VIA VA + + AY) = cufAMA + + + AUD 


and 


GV AWA DAY, Sor (m buf)AuA ++ Au 


Plieker coordinates; duality 567 


Putting these expressions in (11.21) and using (11.20) to get rid of the factor c, we 
find that (11.21) is equivalent to 


11.22 a((u) + uf) AugA + + > Au] = (UA eA © + AUS) 
+ e(U;AUZA + + + AUD) 


This is trivial if uw. = 0, and we assume uy, ~ 0. Set uj’ = uf — am, where 
a = (uf, u1)/(t, m)), so that uf’ is orthogonal to u, (as well as to uz, .. . , Up). 
Writing x = uA +--+ Ady, our equation above then becomes 


123 s{(bu, + Ul AxX) = #(u, Ax) + ¥{(au + af) Ax) 
where 6 = 1+. To prove this equation we show first that 
11.24 a((auy + ui’) Ax] = a+ (uA x) + #(ul’ A x) 


This is trivial if a = 0 or uj’ = 0, and we therefore assume that they are not. zero. 
The vectors um, ui’, +--+, u, are then mutually orthogonal and nonzero. Let. 
Wo,» Wap be an orthonormal base in the orthogonal complement of the 
(p + 1)-dimensional space that they span, Then ut’, we... , Wn-p span the 
space orthogonal to uj, uz, . . . , Up, and so 


11.25 (0, AX) = rulAWeA + + AWap 


for some scalar r. Since v2, ..., ¥p are orthonormal, |det x| =1. Clearly 
[det u| = |u|. From (11.14), det (mA x)| = lu; similarly, det (ruj’ A wsa 
+ AWnep) = |r| + [ui]. From (11.16) there follows |u;| = |r| - |ui’|. 
By similar reasoning, 


11.26 (Ui AX) = BULA WeA + + + A Wry 
and 

jui’| = |s| + [en] 
From (11.25), (11.14) 


(UA X)A#(U)AX) = Te ULAXA UY AWA A Wap 
(-1)? rem AU AKA WEA ++ A Way 
= (-1)? tr lar. [ai] 8 


where {= +1 if {u,, ul’, us... , Up, We,» . . . Wap] is a positive base, and 
t= —1if it is a negative base. By (11.16), (—1)?'rt > 0. In a similar way we 
obtain 


(uy A x) Aw(ut’ A x) = (—1)?s|uiliui’| - #2 


and —(-1)?"'st > 0, Hence r-s < 0, 
The right-hand side of (11,24) is 


1.27 (aruy’ + su) AWeA + +) A Wry 


568 Tensors Ch. 16, Sec. 11 


To show that this is equal to the left member of (11.24), we first observe that 
arui’ + su, is orthogonal to au, + ui’. For 


(aruy’ + su, ath + wi’) = a?r(uy’, wm) + ar(ul’, ui’) + as(m, a) 
+ (uy, ui") 
Now (u/’, ;) = 0. The expression reduces to 
ar: jui’|? + as|uy'* 


Either r or s must be positive. If r <0, then r- [us| = |u!, from above, and 
s+ uj = —|m|". Then (11.27) reduces to a- (lu| - |uy’| — |u| |uf|) = 0. The 
same argument holds if s > Q and r < 0. 

It follows that arui’ + su, w2, .. . , Wap span the space orthogonal to the 
space spanned by au, + ui’, uz, ..., Up Hence (11.27) is equal to the left 
member of (11.24), apart from a possible constant factor. From (11.14), 


det [au + ui’) AusA + +> Aus) = lam + uf’ 


Since u, ui’ are orthogonal, this is equal to va"|m|? + lui‘? Similarly, the 
determinant of (11.27) is equal to 


laruy’ + su] = Varley? + s*]u;? 


Using the fact that rjuj]? = |u|? and s?|m|? ui’, we have 


aru’ + sui! = law + af") 


Therefore (11.27) is equal to the left member of (11.24) apart from the sign. To 
cheek that point, we have 


[au + uj’) Ax] A (aruj’ + 3m) AWLA TA Weoy 
= BU AKAM AWrA + A Wap 
+ su’ AXA UA WrA + + + A Wri 
= (-1)?"Matr — 8) AU AKA WIA De A Wop 


= (1) Yatr = 8) t= [tl = |i)” 2 
and (—1)?-'(a°r — 8)f > 0, from above. This shows that (11.27) is equal to the 
left member of (11.24), verifying (11.24). 

To show that (11.23) holds, we apply (11.24) to the left member with } in place 
of a. The left side of (11.23) is then equal to 
beau Ax) + *(ui Ax) 
The right member of (11.23) is, by (11.24), 
(ay AX) $a a(uy AX) $+ 4(U AX) = (1 +a) (ur ax) + 40H Ad) 


Since 1 +a = 8, this proves (11.23), hence also (11.21). 
This shows that g of (11.19) is linear in the first entry. Since WAWAWA 
TAY, = —VLA + + + AY¥y, We have (We AVA WA © A¥p) = —*(VIA YA 


Plicker coordinates; duality 569 


+ AY), by (11.20). Thus g(vi, v2, . . Vp) = —8(¥a vi V3... « , ¥p), from 
which it follows that g is linear in the second entry ve. The same is true for the 
others. 


Thus g is a p-linear mapping, and clearly g(v, v2, ..., ¥p) = 0 if any two 
entries are equal. Therefore, by Theorem 10.1, there is a unique linear mapping 
g’: APU = Ar?l such that g(v, .. . vp) = 8’(Mi A+ + Avy). That is, 

BMA Fe Avy) = (VA - 2 + A ¥y) 


For an arbitrary element v in A?U we denote the element g'(v) by xv. Finally, for 
p = 0,we define #c = ef for any scalar ¢, © being as in (11.8); for p = m, we define 
+(c®) = ¢. We have proved the following result: 
THEOREM 112 Let U be an oriented n-dimensional euclidean vector space over R. 
For each p = 0,1, ... , there is a linear mapping * from APU to An-PU such 
that +v is given by the following conditions if v is decomposable: xv is orthogonal fo v, 
|det vj = |det («v)|, and det (vA xv) > 0. 

It is easily verified, using (11.16), that 


11.28 a(RY) = (—1)PO Phy for vin Art? 


Hence « is an isomorphism. Further, x can be computed from its effect on a posi- 
tive orthonormal base {e, . . . , en} by (11.17), since + is linear. 
From (11.17) we have at once 


13s (€, Ao AG) ARE, As + AU,) = ONO 


i 
where 8! - is the generalized Kronecker symbol introduced in See. 10. Let 
us define (u, v) by the formula 

a uAay = (u,¥)-2 — for u, vin APU 


Hence (u, v) = det (ua xv), and (u, ¥) is a bilinear function on AvU. Equation 
(11.29) becomes 


Hat (Cg A= + AG, OA - + Ae),) = LP 


If 


Att Aes, 


and 
v= SD reas ney 


then there follows, from (11.31), 
aa (uv) = wth tap p 


ic TT ip 


570 Tensors Ch. 18, See. 11 


Hence (u, v) is a symmetric bilinear funetion on A°U, and (u, u) > O-unless u = 0. 
This form therefore gives A’U the structure of a euclidean vector space. 
For n = 3 it is clear from (11.18) that 


was xX XY = ¥(KAY) 
for any x and y in U, the left side of (11.38) being as defined in Sec. 18, Chap. 14. 


The euclidean structure of U therefore defines a euclidean structure in each A’U. 
In general, a bilinear function on a vector space U defines a bilinear function in 
APU, as we now show. 


Then let U be an n-dimensional vector space over a field K, and let H denote a 
bilinear function on U, that is, a bilinear mapping H: U x U — K. Consider the 
mapping 


fUx.+.xUsK 
eed 
defined by 
Fre a (Cs Te) 
=2sien (57! P)- Ho, Ya) + Hp ¥5) 
the sum being over all permutations of 1... , p. ‘The mapping is visibly a 


multilinear mapping. If two adjacent x; are equal, then the right-hand side is 
zero, For example, if x; = x2, then the terms 


A(x, Ya) (ks Yq) + + + HOw, yi) 
and 
AX Yn) + A(X, Yu) + + Ap Yi) 


are equal but occur in (11.34) with opposite signs. In a similar way one shows that 
(11.84) is zero of two adjacent y’s are equal. 


For fixed elements y1, . . . , Yp» (11.84) defines a p-linear mapping, sending a 
p-tuple x, .. . ,xpintof(m, ... .%ps yn. . +. ¥p)- By Theorem 10.1, there is 
a unique mapping 

Pi:NUxXUX++) KUSK 
VAT AS 
Pp 


which is linear in its first entry and satisfies 
Fy Me Yt Wp) = POA AR + Ye) 


Since A’U is spanned by decomposable elements, one verifies easily that, for any 
fixed element u in A’U, the quantity 


PCW; Yn. +s Ye) 


Plicker coordinates; duality 57 
is linear in the p entries y;, . . . , yp and is zero if any two adjacent y's are equal. 
By Theorem 10.1 again, there is a unique mapping 

H®: APU x APU aK 
such that 

Puy... ¥p) = HOC yA A¥p) 
It follows easily that H‘” is bilinear. In short, H? is the unique bilinear function 
on AvU such that 
35 HOO A AXpS FLA + Ap) 


flees 
= = sign G. PY Hox, Ya) +++ H(Xp, ¥i,) 


If H is symmetric, so is H&, as is easily verified. 
To compute H in terms of bases, Jet fe: .. . . én} be a base in U, and put 


360 (ez, @)) = Gi; 
From (11.35) we obtain 
Hea + AU, eA + + AG) = L sign (8)- Gain - > Oipip 


the sum being over all permutations s of 1,..., p (in the sum sl, ... , sp 
denote the effects of son 1, ... , ). From (2.7), Chap. 11, we see that 


37 HOO A+ Ay OA AG) Hl ore eee 
Vinh 1 > Ginn 
Suppose now that U is a euclidean vector space over R, with inner product H(x, y). 
If fe, » @,} is an orthonormal base, then gi; = 1 if i = 7 and g:; = 0 other- 
wise. Ift) << +--+ <i,andj, < +--+ <j,, then (11.37) is clearly equal to zero 
unless #, = #1, . . . , ty = Jy, in which case it is equal to 1. Therefore we see that 
the p-vectors @,A --- Ae, with i, < + -- <i, form an orthonormal base in 
APU for the inner product H?, From (11.31) it follows that H‘”’ in this case is the 
same as the inner product defined in (11.30) by means of the dual operator +. 


EXERCISES. 
1. Let fe, . . . , e,} be an orthonormal base in an oriented euclidean vector 
space U over R. Let x = 3e, — ey +e; and let y =e +e: —e;, Compute 
a(R AY). 
2. Let fe... ,e:} be as above, with n = 4. Find the Plucker coordinates of 


the two-dimensional space spanned by u — e; + de, and ¥ = e, + 2 — 8e;. 
Compute det ua ¥. 

3. Let fe, ..., @:} be as above. The quantities #? = 4,09 = -1, ¢ = 2, 
3 = —6, 2 = 4, ( = 2 are Pliicker coordinates of a two-dimensional subspace. 


Find a base for that subspace. 


572 Tensors Ch. 16, See. 12 


4, Let V; and Vs be p-dimensional subspaces of an oriented »-dimensional 
euclidean vector space U. Prove that V, = Ve if they have the same Pliicker 
coordinates, 


5. Let U be as above, and let v1, . . . , vp be vectors in U, with v, orthogonal 
to v2... 4 ¥p. Prove that 
[wil2 © (We A 2 AY) = (HUT AEA VA DO A YD) 


6. Prove that there is a natural isomorphism from A*U* to the dual space of 
AU, U being a finite dimensional vector space over any field. 


12, Skew-symmetric tensors 


We consider here an n-dimensional vector space U over a field K of characteristic 
zero. That is, we assume that no multiple of the unit element 1 in K is zero. The 
fields @, R, C are fields of characteristic zero. 

Consider now the tensor space 


aa Uy =U... @U (p times) 


In Sec. 8 we defined the notion of an allernating, or skew-symmetric, element of 
Ur, Let us denote by A7(U) the set of all alternating elements of U%. A*(U) isa 
subspace of U%.t We shall show here that A*(U) can be identified with the 
exterior product AU: 

Let A, denote the alternation operator defined in (8.20). Thus A, is a linear 
mapping U7, — A”(U), and A,(t) = 1 for any tin A*U. It follows at once that 
A*(U) is spanned by the elements 


12.2 Ale, @ +++ @ e:,) GQ <igcees <4) 
and they form a base in A?(U), as follows easily from the fact that all the elements 
e @ + + + @ ej, forma base in Ur». 
The mapping 
by: OX +++ xX U>AU) 


defined by 
123 hy(x, . . Xp) = Ap(Xi @ + + ® xy) 
is a p-linear mapping, obviously, and h(x, . . . , Xp) = 0 if any two adjacent x; 


are equal. Therefore (Theorem 10.1) there is 2 unique linear mapping hi,:A’U — 
A?(U) such that 


aaa WA AX) = Ale, @ + + OX) 
for any x, in U.t In particular 
Wyle, A+ Ae) = Apley @ + >> Be) 


+ By A%\T!) we understand U% = K. Similarly, A'(U) is UT itself. 
1 For p = 0, hi = bs is just the identity mapping of AU = K to U4 = K. 


Skew-symmetric tensors 578 


From this we conclude that hy is an isomorphism. Furthermore, if u is in ArpU 
and if v is in AvU, then 


12.5 hos (W A ¥) = Apsa(hp (a) @ hiv) 


By linearity it suffices to check this for decomposable elements u = KA - + - A Xp 
andy =y,A-++Ay,. The right-hand side of (12.5) is then equal to 
126 Ay fApts B+ + @ Xp) @ Ay @ +++ @Yx)) 
By (8.20), 
1 . 
AdlX @ + + BX) = Fy DI sign (o) Xe ® + + + © tom 
where « runs over all permutations of 1, ..., p. Similarly, if we write y, = 
Xptte ss ey Yq = Xptq then 
1 . 
Ady @+++ @Y) = @ sign o° tn @ ++ @ Xeipyy 
where o’ runs over all permutations of p + 1,...,p +4. Foro and o’ as above 
denote by oo" the permutation of 1, .. . , p + q defined by 
aot) = (7) Gases) 
los) G=ptl..-.9 
Then (12.6) can be written as 
1 5 , 
> sign (ro0')Xreo1) @ + + + BD Keorwsw 


pulp Fal © 
using the fact that sign (r) sign (c) sign(o’} = sign (roo’), 7 here running through 
all permutations of 1,..., 9+ 4. As it does so, the element rec‘ also runs 
through all permutations of 1, . . . , p + q, and there are pity! pairs ¢, o’. Hence 
the sum above is the same as 


1 . 
wro DY sign(r) xy @ +++ @ eae 
= Ags u(r @ + + + @ Xpau) 
=Aprl @ ++ OH ON O++- Sy) 


ba eBiA ss AXpAY As + + AY,) 


and this is equal to the left side of (12.5) for u = x.A +++ Ax, and v¥ = yA 
TAY as 


If we define a product t, A t, of an element t, in A’(U) and an element ¢; in 
ANU) by 


127 that, = Apry(tp @ ti) 


57h Tensors Ch, 16, See. 12 
then (12.5) becomes 


328 Why g(t v) = hy(u) ABi(y) 
Now put 
ans A(U) = AU) BANU) @ + - BAN) 


where we understand AU) = K and A(U) = U. If ¢and ¢’ are any two ele- 
ments of (12.9), then we can write uniquely 


t=&+hteee +h 
i 
with t, and t, in A*(U), We define the product of t and t’ by the obvious rule 


qae tA = Deak 
a 


each term on the right being defined by (12.7), Furthermore any u in AU can be 
written uniquely as a sum 


usmtuters to, 
with up in APU. We define 


aaa1h(u) = >> by(uy) 
i 


It is clear that h is a linear mapping h: AU — AU, and 
122 h(wa vy) = h(u) Ab(y) 


for any two elements u, v of AU, and that h is compatible with the operation of 
GLU). 
We have proved the following theorem: 


THEOREM 12.1 Let U be an n-dimensional vector space over a field K of characteristic 
zero. Let A(U) denote the algebra of alternating tensors (12.9) with product defined by 
(12,7). Then there is a natural isomorphismh: AU > A(U) of the exterior algebra 
of U to A(U). 

For this reason the exterior algebra of a vector space over the real field is quite 
often defined to be what we have here called A(U). According to Theorem 7.1, 
an element in A*(U) can be unambiguously identified with a skew-symmetric 
p-linear function on U*. 


Index 


Numbers in italic type indicate the reference is in an exercise 


Abelian group, 20, 171 

finite, 298-303 
Absolute value, $5 

for complex numbers, 120 

for mappings, 221 

for matrices, 234 

for polynomials, 134 

for transformations, 221 
Additive group of integers, 266 

(See also Abelian group) 
Adjoint matrix, 314 

of self-adjoint endomorphism, 470-471 
Affine group, 491 
Affine space, 187-198 
Affine subspace, 192 
Affine transformation, 195, 491 
Algebra, division, 124 

of endomorphisms, 342 

exterior, 551-561 

fundamental theorem of, 121 

K-, 175, 342 

Lie, 485 

quotient, 552 

tensor, 527-534 
Algebraic, definition, 134 
Algebraic number, 134 
Algebraieally closed field, 121, 510 
Alternating group, 274 
Alternating tensor, 544-545 
Analytic geometry, 205-215, 486-493 
Angle, between lines, 201 

between vectors, 461 
Antihomomorphism, 282 
Archimedean set, 117 
Argument, 126 
Arithmetic, fundamental theorem of, 59 
Associative axiom (law), 8 
Asymptotes of hyperbola, 209 
Automorphism, 219 

inner, 268, 277 


Axes, coordinate, 195 
semimajor, semiminor, 208 
Axiom (law), 1 
associative, 8 
cancellation, 24, 81 
commutative, 8, 11 
for determinants, 305 
distributive, 22, 29 
of induction, 43 


d-adic expansion, 64-66 
Base of vector space, 184-187 
canonical, 185 
change of, 238-242 
definition of, 184 
dual, 227, 434 
orthogonal, 202 
orthonormal, 463 
Bernstein’s theorem, 511 
Bessel’s inequality, 467 
Bijeetive mapping, 495 
Bilinear form, 488 
matrix of, 489 
symmetric, 438, 
Bilinear funetion, 438 
rank of, 440 
Bilinear mapping, 515 
Binary operation, 5, 495 
Binomial coefficients, 95 
Binomial theorem, 95-99 
Biquadratic equation, 160 
Bolzano-Weierstrass theorem, 117 
Bound, greatest lower, 111 
least upper, 111 
Bounded, 39 


Cancellation axiom (law), 24, 31 
Cancellation theorem, 10 


576 Index 


Cancellation theorem, for groups, 27 Component, of veetor, 184, 216 
Canonical base, 185, 234 Component vector (of matrix), 233 
Canonical homomorphism, 278 Composition, of mappings, 3, 495 
Cantor's first diagonal process, 89 series, 295 
Cardinal number, 510 factors of, 295 
Cartesian coordinate system, 194 (See also Product) 
Cartesian product, 514 Congruence, 66-77 
Cauchy sequence, 102, 505 definition of, 66 
for complex numbers, 129 Conie section, 211 
in vector space, 469 Conjugate, complex numbers, 120 
Cayley-Hamilton theorem, 319, 326 subgroup, 278 
Center of group, 280 Conjugation, 268 
Chain rule, 352 Constant funetion, 347 
Character, group, 202 Constant sequence, 103 
principal, 303 Constants, 148, 347 
Characteristic of field, 84, 99, 305 of integration, 364 
Characteristic polynomial, 317-324 Continued fractions, 101 
definition of, 319 Contraction of tensors, 538-543 
for linear mapping, 325 Contravariant tensor, 529 
Characteristic root (see Eigenvalue) Convergence, of complex sequence, 129 
Characteristic subspaces, 395-399 of real sequence, 104 
definition of, 395 of series, 116 
Cirele, 207 Coordinate axes, 195 
Class (set), 1, 2 Coordinate hyperplane, 196 
residue, 68, 72 Coordinate system, cartesian, 194 
Coefficients, 133 euclidean, 202 
torsion, 308 Coordinates of a point, 194 
trinomial, 99 Coprime, 59, 141 
undetermined, method of, 364 Correspondence, one-to-one, 2 
Cofactor, 312 Coset, left, 258, 275, 498 
Collection, 1 right, 278 
Collinear points, 194 Cosine, direction, 203 
Column operation, 250 Euler’s definition of, 368 
Column rank, 242 law of, 200, 201, 462 
Column vectors, 229 Countable set, 42 
Commutassociative law, 12 of rational numbers, 89 
Commutative axiom (law), 8, 11 Covariant tensor, 529 
Commutative group (see Abelian group) Cramer's rule, 315 
Commutative ring, 22, 24, 28, 52, 342 Cross product, 482-484 
Commutator subgroup, 280 Cube, 338 
Complete ordered field, 108 Cubic equation, 159 
Complete ordered space, 469 Cycle, 270 
Completing the square, 446 Cyclic base, 381 
Complex conjugate, 120 Cyelic group, 268 
Complex numbers, 118-132 Cyclic permutation, 271 
absolute value for, 120 Cyclic subgroup, 267 


argument of, 126 
construction of, 122 


field of, definition, 118, 119 Decimals, 89-92 

modulus of, 120 definition of, 90 

polar form, 126 expansion, of a rational number, 91 

real, imaginary part, 120 of a real number, 113 
Complex-valued function, 347 recurrent (periodic), 90 


Component, of tensor, 535 terminating, 90 


Index 


Decomposable element, 562, 564 
Decomposition into primes, 60, 146 
Degree, of a field, 187 
of a polynomial, 135, 158, 317 
De Moivre's theorem, 127 
Denumerable set, 42 
Dependence, linear, 178-179 
Dependent points, 193 
Derivative, 152, 220, 348 
Derivative mapping, 382, 350 
Determinants, 304-817, 559 
axioms for, 305 
cofactor of, 312 
existence of, 309 
of linear mapping, 325 
minor of, 412 
Vandermonde, 313 
as volumes, 333-339 
Diagonal coefficients, 236 
Diagonal form, 240 
reduction to, 246-257 
Diagonal mapping, 381 
Diagonal matrix, 236, 379, 381 
Diagonal process, Cantor's, 89 
Diagonalization, of Lermitian form, 455 
of quadratie form, 449 
simultaneous, 474 
of unitary matrix, 478 
Difference, 22, 28 
Difference quotient, 349 
Differentiable function, 348 
indefinitely, 349 
Differential equations, linear, 354-370 
definition of, 354 
general solution of, 362-369 
particular solution of, 362 
systems of, 370-376 
Differentiation operator, 347-352 
definition of, 351 
Dimension, of affine space, 188 
of vector space, 180, 182-183 
Direct sum (product), 292, 380, 382 
of maps, 385, 533 
of vector spaces, 173, 177, 380, 882, 527 
Direct sum decomposition, 382-389 
Direction cosines, 208 
Directrix of a parabola, 210 
Dirichlet’s theorem on primes, 76 
Discriminant, 213, 492 
Disjoint cycles, 272 
Distance, 200, 462 
Distributive axiom, 22, 29 
Divide, 54; 140 
Division, definition of, 26, 82 


Division algebra, 124 
Division algorithm, 58, 142 
Divisors, 54 

elementary, 411-420 
Domain, integral, 24, 31 

of mapping, 495. 

ordered, 33 
Dual base, 227, 434 
Dual vector space, 223, 438, 565 
Dyadic tensors, 441, 542 


@, definition of, 130, 368 
Eigenvalues, 324-393, 379 
definition of, 326 
of Hermitian matrix, 470-474 
multiplicity of, 328 
of unitary matrix, 478 
Eigenvector, 324-333, 879 
definition of, 326 
Finstein summation convention, 217 
Element, 1 
identity, 7, 8 
inverse, 9, 341 
prime, 146 
unit, 22, 29 
Elementary column operations, 250 
Elementary divisors, 411-420 
definition of, 414 
Elementary linear mappings, 880-382 
Elementary number theory, 54-62 
Elementary row operations, 248 
Elementary symmetric functions, 
186 
Ellipse, 208 
Empty subset, 2 
Endomorphism, 219 
nilpotent (see Mappings) 
semisimple, 481 
Epimorphism, 218 
Equations, biquadratic, 160 
of a circle, 207 
cubic, 159 
differential (see Differential equations) 
general, of second degree, 211 
linear, 244 
parametric, of line, 191, 192, 206 
quadratic, 158 
quartic, 160 
(Ste also Polynomials) 
Equivalence classes, 110, 497 
Equivalence relation, 496 
Ergodic theorem, 479 
Euclid theorem on primes, 55 


578 Index 


Euclidean algorithm, for integers, 57 
for polynomials, 143 
Euclidean coordinate system, 202, 486 
Euclidean ring, 147 
Euclidean spaces, 198-204 
definition of, 200 
Euclidean veetor space, 198, 459 
Euler function, 62, 70 
Euler's formula for sine and cosine, 368 
Even permutation, 272 
Exponent, 46-48 
Exponential complex function, 180 
Exponential of matrix, 371 
Exponential real function, 116 
Exterior algebra, 551-561 
Exterior multiplication, 552 


Factor group (quotient group), 278-282, 500 
Factorial, 46 
Factorization, of integer, 60 
of polynomial, 146 
Factors of composition, 293, 295 
Fermat’s theorem, 69, 76 
generalized, 70, 76 
Field, 81-85 
complete ordered, 108 
definition of, 25, 81 
ordered, 87 
quotient, 85-86, 503-504 
of rational funetions, 168 
of rational numbers, 85-89 
of real numbers, 108-111 
skew, 124 
subfield, 27 
Finite abelian group, 298-303 
Finite group, 277 
Finite sequence, 42 
Finite set, 42, 511 
First isomorphism theorem, 278 
Focus, of ellipse, 208 
of hyperbola, 209 
of parabola, 210 
Form, bilinear, 438 
diagonal, 240, 246-257 
hermitian, 453 
linear, 433 
nondegenerate, 441 
polynomial, 133 
positive definite, 453 
quadratic (see Quadratic form) 
skew-symmetric, 438 
Fourier coefficients of a vector, 467 
Fourier series, 469 


Fractions, partial, 165-169 
Free module, 264 
Functions, bilinear, 438 
rank of, 440 

complex-valued, 347 

constant, 347 

differentiable, 348 

indefinitely, 349 

elementary symmetric, 156 

linear, 433 

multilinear, 306 

product of, 495 

real-valued, 347 

skew-symmetric, 306, 438 

symmetric, 438 

trigonometric, 367 

(See also Form; Mappings) 
Fundamental theorem, of algebra, 121 

of arithmetic, 59 

on linear equations, 244 


G-orbit of group of transformations, 283, 502 
Galois group, 297 
General equation of second degree, 211 
General linear group, 225, 582 
Generator, of equivalence class, 511 
infinitesimal, 377 
Geometric series, 116, 130 
complex, 130 
Geometry, analytic, 205-215, 486-493 
Grassmann variety, 563 
Greatest common divisor, 56 
in integral domain, 141 
Greatest lower bound, 111 
Groups, 12, 20, 266 
abelian, 20, 171 
additive, of integers, 266 
alternating, 274 
automorphism of, 266 
cyclic, 268 
finite abelian, 298-303 
Galois, 297 
general linear, 225, 532 
homomorphism of, 17, 266 
infinite cyclic, 268 
isomorphism of, 266 
Lorentz, 458 
multiplicative, 20 
one-parameter, of transformations, 377 
operation, 5, 495 
opposite, 268 
orthogonal, 456 
quotient, 278-282, 500 


Index 


Groups, simple, 291 
solvable, 297 
symmetric, 268, 270 
transformation, 282-286 
transitive, 274 
unitary, 456 
(See also Subgroup) 


H-orthogonal, 455 
H-unitary, 455 
Hermitian form, cartesian, 453 
diagonalization of, 455 
positive definite, 453 
rank of, 454 
Hermitian matrix, 454 
Highest coefficient of polynomial, 135 
Hilhert space, 470 
Homogeneous coordinates, 562 
Homomorphism, canonical (of groups), 278 
group, 17, 266 
induced, 278 
of mappings, 221-222 
module, 262 
ring, 24, 31, 344 
substitution, 137, 345, 508 
unitary, 344 
Hyperbola, 209 
Hypercomplex system (see Algebra) 
Hyperplane, 198, 333, 488 
coordinate, 196 


Ideal, 264, 412, 501 
generated by subset, 501 
left, 264 
maximal, 502 
prime, 502 
principal, 264, 428, 501 
right, 265 

Idempotent element, 343 

Idempotent mapping, 383 

Identity, Jacobi, 484 

Identity element, 7, 8 

Identity mapping, 3, 218, 495 

Image, of homomorphism, 19 
inverse, 495 
of linear mapping, 218 
of mapping, 2, 495 

Imaginary number, 120 

Imaginary units, 119 

Independence, linear, 178-179 

Independent points, 193 

Indeterminate, 134, 185, 154 


579 


Index, of group, 276 
of permutation, 272 
Induced homomorphism, 278 
Induced mapping, 278 
Induction, 9, 48-54 
axiom of, 43 
definition by, 44 
proof by, 43-44 
Inequalities, 35 
Schwarz, 199, 459 
triangle (see Triangle inequality) 
Infinite cyclic group, 268 
Infinite sequence, 42, 90, 504 
Infinite series, 42, 116 
Infinite set, 42, 511 
Infinitesimal generator, 377 
Infinitesimal rotation, 378 
Injective mapping, 495, 534 
Inner automorphism, 268, 277 
Inner product, 199, 223, 459 
Integers, 28, 36-43 
additive group of, 266 
notation for, 62-66 
positive, 39 
Integral domain, 24, 31 
ordered, 33 
Invariant subgroup (normal subgroup), 77, 
499 
Inverse of matrix, 239 
Inverse element, 9, 341 
Inverse image, 495 
Inverse mapping, 3, 495 
Invertible element, 140, 341 
Irrational number, 110 
Irreducible matrix, 237 
Irreducible polynomial, 141 
Isomorphism, of groups, 17, 266 
of linear mappings, 218 
of rings, 31 
of vector spaces, 185 
Isomorphism theorems, first and second, 
278-279 
Isotropy subgroup (stabilizer), 284, 502 


Jacobi identity, 484 
Jordan normal form, 399-406, 409 
theorem of, 401 
uniqueness of, 406408 
Jordan-Hélder theorem, 291-298 
statement of, 295 


K-algebra, 175, 342 
Kernel, of homomorphism, 19, 25, 92 


580 Index 

Kernel, of linear mapping, 218 
Klein 4-group, 293 

Kronecker delta, 230 

Kronecker symbol, generalized, 560 


Lagrange’s interpolation formula, 158 
Lagrange’s theorem, 277 
Latin square, 84 
orthogonal, 85 
Law, of cosines, 200, 201, 462 
of inertia, Sylvester's, 451 
of quadratic reciprocity, 75 
of sines, 204, 436 
Least common multiple, 6/ 
Least upper bound, 111 
Left coset, 275 
Left translation, 16, 20, 282 
Legendre polynomials, 332, 470 
Legendre symbol, 75, 77 
Leibniz’s formula, $51 
Length, 833 
of vectors, 198, 459, 549, 551 
Lie algebra, 485 
Limit, 348 
of sequence, 104 
of series, 116, 129 
Line, 191, 192 
oriented, 201 
orthogonal, 201 
parallel, 201 
perpendicular, 201 
Linear dependence, 178-179 
Linear differential equations (see Differential 
equations) 
Linear equations, fundamental theorem on, 
244 
Linear function, 433 
Linear group, 225 
Linear independence, 178-179 
Linear mapping, 217-221 
elementary, 380-382 
isomorphism of, 218 
Linear space (see Vector space) 
Linear transformation, 217-221 
Locus, 207 
Lorentz group, 458 
Lower bound, greatest, 111 


Magnitude (see Length) 

Mapping principle, 44 

Mappings, absolute value for, 221 
bijective, 495 


Mappings, bilinear, 515 
composition of, 3, 495 
derivative, 382, 350 
diagonal, 381 
domain of, 495 
identity, 3, 218, 495 
image of, inverse, 495 
induced, 278 
injective, 495, 534 
inverse, 3, 495 
linear, 217 
determinants of, 325 
elementary, 380-382 
minimal polynomial for, 326 

raultilinear, 515 

nileyctic, 381 

nilpotent, 382, 389-395 

one-to-one, 3, 495 

orthogonal, 459 

permutation, 3 


projection, 259 
range of, 495 
restriction, 259, 496 
sealar, 348, 380 
similar, 408, 420-423 
skew-symmetric, $10 
surjective, 495 
tensor product of, 524-527 
unitary, 459 

Mathematical induction (see Induetion) 

Matrices, 25, $2, 227-239 
absolute value for, 234 
adjoint, 314 
algebra, 25, $2 
block form, 260 
diagonal, 236, 879, 381 
diagonal coefficients, 236 
exponential of, 371 
Hermitian, 454 
identity, 230 
inverse, 239 
irreducible, 237 
minimal polynomial of, 322 
minor of, 311, 312, 412 
modal, 477 
nonsingular, 239 
orthogonal, 456, 478-479 
product of, 231 
rank of, 242 
reducible, 237 
scalar, 237, 343 
scalar multiplication of, 234 
similar, 409 
singular, 239 


Index 


Matrices, skew-symmetric, 310 
square, 230 
stochastic, 333 
submatrix, 243 
sum of, 243 
symmetric, 236 
trace of, 811, 319, 538 
transpose of, 235 
triangular, 236, 252, 310 
unit, 230 
unitary, 456, 478-479 
zero, 229 
Maximal ideal, 502 
Maxima! normal subgroup, 294 
Mereator's projection, 4 
Metacyclic (solvable) group, 297 
Metric, 548 
(See also Distance) 
Minimal polynomial, for linear mapping, 326 
of matrix, 322 
Minor of matrix, 311, 312, 412 
Modal matrix, 477 
Module, 261-265 
free, 264 
quotient, 262 
submodule, 262 
Modulus, 120 
Monic polynomial, 135 
Monomial, 153 
Monomorphism, 218 
Multilinear function, 306, 541 
Multilinear mapping, 515 
Multiplication {see Product) 
Multiplicity, of eigenvalues, 328 
of roots, 149 


Natura! numbers, 28, 39 
Negative element, 33 
Negative rational number, 87 
Newton formulas, 158 
Nileyelie map, 381 
Nilpotent element, 343 
Nilpotent mapping, 382, 389-395 
Nondegenerate form, 441 
Nonresidue, quadratic, 75 
Nonsingular matrix, 239 
Normal form, for ellipse, 208 

for hyperbola, 209 

Jordan (see Jordan normal form) 
Normal subgroup, 77, 499 
Normalization, 463 
Normalizer, 280 
ntuple, 174 


581 


n-tuple, ordered, 42 
Nallity of map, 219 
Nullspace of quadratic form, 451 
Number theory, 54-77 
Numbers, algebraic, 134 
cardinal, 510 
complex, 118-182 
irrational, 110 
prime, 55 
rational, 85-89 
real, 108-111 
transcendental, 134 


Odd permutation, 272 
One-to-one mapping, 3, 495 
One-parameter group, 377 
Operate transitively, 297 
Operator (see Binary operation; Mapping) 
Opposite group, 263 
Orbit, G-, of group of transformations, 288, 
502 
s-, of cycle, 271 
Order, of cycle, 270 
of group, 277 
of group element, 267 
Ordered field, 87 
Ordered integral domain, 33 
Ordered n-tuple, 42 
Orientation, 480 
Oriented line, 201 
Oriented vector space, 481, 563 
Origin, 194 
Orthogonal, H-, 455 
Orthogonal base, 202 
Orthogonal complement, 204, 497, 463, 565 
Orthogonal group, 456 
Orthogonal Latin squares, 5 
Orthogonal lines, 201 
Orthogonal mapping, 459 
Orthogonal matrix, 456, 478-479 
Orthogonal subspace, 463 
Orthogonal vectors, 199, 457, 462 
Orthogonalization process, 204 
Orthonorma? base, 201-202, 383, 463, 548 
Orthonormal vectors, 199 
set of, 463 
Orthonormalization process, Schmidt, 464 
Outer product, 482-484 


Parabola, 210 
Paralle! lines, 201 
Parallelogram, 334 


582 Index 
Parallelotope, 337 
Parameter, 203 
Parametric equations of line, 197, 203, 205, 
487 
Partial fractions, 165-169 
Partial sums, 116 
Peano’s postulates, 28, 41 
Permutation, 3, 268-275 
eyelic, 271 
definition of, 268 
even, 272 
odd, 272 
Perpendicular distance, 210 
Perpendicular element, 466 
Perpendicular lines, 207 
Perpendicular vectors, 462 
Plane, 191-192 
Pliicker coordinates, 561 
Points, 189 
dependent, 193 
independent, 193 
Polar form, of complex aumber, 126 
of quadratie form, 442 
Polynomials, 133-149 
absolute value for, 134 
characteristic (see Characteristic polyno- 
mial} 
definition of, 133 
irreducible, 141 
Legendre, 332 
minimal, $22, 326 
monic, 135 
reducible, 141 
ring, 135, 154 
in several variables, 153-158 
symmetric, 155 
zero of, 148, 508 
Positive definite, definition, 453 
Positive element, 33, 87 
Power (exponent), 46-48 
Prime, relatively (coprime), 59, 141 
Prime decomposition, 60, 146 
Prime element, 140 
Prime ideal, 502 
Prime number, 55 
Prime-number theorem, 76 
Principal-axis theorem, 475 
Principal character, 303 
Principal ideal, 264, 428, 501 
Produet, 1, 20, 22, 29 
direct (see Direct sum) 
exterior, 552 
of functions, 495 
inner, 199, 223, 459 


Produet, of matrices, 231 
outer, 482-484 
scalar (see Scalar multiplication) 
tensor (see Tensor product) 
vector, 482-484 

Projection, 259, 383, 466, 498 

Projective space, 562 


Quadratic equations, 158 
Quadratic form, 213, 441-446 
definition of, 442 
diagonalization of, 449 
nullspace of, 451 
polar form of, 442 
rank of, 450 
Quadratic nonresidue, 75 
Quadratic reciprocity, law of, 75 
Quadratic residue, 75 
Quadric surface, 476, 488 
Quartie equation, 160 
Quaternions, 123-125 
Quotient, 54, 56, 143 
difference, 349 
Quotient algebra, 552 
Quotient field, 85-86, 503-504 
Quotient group, 278-282, 500 
Quotient module, 262 
Quotient ring, 501 
Quotient set, 498 
Quotient space, 258-261 


Range of mapping, 495 
Rank, of bilinear form, 440 
columa, 242 
of group, 432 
of Hermitian form, 454 
of map, 219 
of matrix, 242 
of quadratie form, 450 
row, 242 
Rational canonical form, theorem on, 429, 
480 
Rational function, 163-165 
field of, 163 
Rational number, 85-89, 110 
negative, 87 
Real numbers, field of, 108-111 
Real part of complex number, 120 
Real valued function, 347 
Redueible matrix, 237 
Reducible polynomial, 141 
Reflective relation, 496 


Index 


Relation, equivalence, 496 
reflective, 496 
Relatively prime, 59, 141 
Remainder, 56, 143 
Residue class, 68, 72 
complete set of, 68 
quadratic, 75 
reduced set of, 70 
Restriction (mapping), 259, 385, 496 
Right coset, 275 
Right translation, 16, 282 
Rigid motion, 492 
Ring, 22, 29, 341 
commutative, 22, 24, 28, 58, 342 
of endomorphisms, 342 
homomorphism of, 24, 31, 344 
of polynomials, 135, 164 
quotient, 501 
(See also Subring) 
Roots of polynomials, 148, 508 
characteristic, 319 
multiplicity of, 149 
Rotation, of axes, 212 
infinitesimal, 378 
Row operations, elementary, 248 
Row rank, 242 
Row veetors, 228 


s-orbit of eycle, 271 
Sealar, 171 
Scalar mapping, 343 
Scalar matrix, 237 
Scalar multiplication, 172 
of maps, 221 
of matrices, 234 
Schmidt process, 204, 464 
Schwarz inequality, 199, 459 
Second isomorphism theorem, 278 
Segment, 38 
Semimajor, semiminor axes, 208 
Semisimple endomorphism, 432 
Sequence, Cauchy (see Cauchy sequence) 
constant, 103 
finite, 42 
infinite, 42, 90, 504 
Series, geometric, 116, 130 
infinite, 42, 116 
of complex numbers, 129 
composition, 295 
countable (denumberable), 42 
of real numbers, 116 
Sets, 1 
Archimedean, 117 


583 


Sets, countable, 42 
of rational numbers, 89 
finite, 42, 511 
quotient, 498 
subset, empty, 2 
union of, 43 
Shear, 336, 339 
Signature, of Hermitian form, 455 
of quadratic form, 451 
Similar mappings, 408, 420-423 
Similar matrices, 409 
Simple group, 291 
Simple p-veetors, 562 
Simultaneous diagonalization, 474 
Sine, Euler's definition of, 368 
Sines, law of, 204, 486 
Singular matrix, 239 
Skew field, 124 
Skew-symmetrie form, 438 
Skew-symmetrie funetion, 306, 438 
Skew-symmetrie matrix, (310) 
Skew-symmetric tensor, 544-545, 572-574 
Slope, 206 
Solvable group, 297 
Space, affine, 187-197 
Euclidean, 198-204 
projective, 562 
quotient, 258-261 
vector (see Veetor space) 
Square matrix, 230 
Stabilizer, 284, 502 
Stable subspace, 259 
Stereographie projection, 2, 4 
Stochastic matrix, 234 
Subfield, 27, 83 
(See also Field) 
Subgroup, 275-282 
center, 280 
commutator, 280 
conjugate, 278 
cyclic, 267 
definition of, 266 
improper, 15 
maximal, 294 
normal, 277, 499 
proper, 15 
stabilizer (isotropy), 284, 502 
Sylow, 288 
torsion, 432 
{See also Groups) 
Submatrix, 243 
Submodule, 262 
Subring, 23, 30, 88 
(See also Ring) 


584 Index 
Subset, 1 
empty, 2 
Subspace, of affine space, 191 
characteristic, 395 
T-stable, 259, 385 
of vector space, 176 
Substitution homomorphism, 137, 345, 508 
Subtraction, 26, 82 
Successor of 2, 41 
Sum, direct (sce Direct sum) 
of infinite series, 116 
Summation convention, Einstein, 217 
Surjective mapping, 495, 534 
Sylvester’s law of inertia, 451 
Symmetric bilinear form, 438 
Symmetric functions, 438 
elementary, 156 
Symmetric group, 268, 270 
Symmetric matrix, 236 
Symmetric polynomials, 155 
Symmetric relation, 496 
Symmetric tensor, 544-545 
Sylow subgroups, 288 
Sylow’s theorems, 286-291 
Systems of differential equations, 370-376 


T-stahle subspace, 259, 385 
Taylor's series, 491 
Tensor algebra, 527-534 
contravariant, covariant, 581 
‘Tensor product, 515-527 
contraction of, 538-543 
of mappings, 524-527 
of more than two factors, 519-520 
Tensors, 441, 528 
alternating, 544-545 
contraction of, 538-543 
contravariant, 529 
covariant, 529 
dyadic, 441, 542 
mixed, 529 
skew-symmetric, 544-545, 572-574 
symmetric, 544-545 
Torsion coefficients, 303 
Torsion order, 414, 430, 432 
Torsion subgroup, 432 
‘Trace, 321, 819, 588 
‘Trangcendental numbers, 134 
Transformations, absolute value for, 
228 
affine, 195, 491 
group of, 282-286 


‘Transformations, volume-preserving, 492 
(See also Functions; Mappings) 
Transitive group, 274 
Transitive relation, 496 
Transitively operate, 291 
Translation, 189-190, 195, 211, 282 
of axes, 211 
left, right, of groups, 16, 20, 282 
Transpose, of map, 224, 436 
of matrix, 235 
‘Transposition, 273 
Triangle inequality, for absolute value of 
real numbers, 35 
for complex numbers, 120 
for Euclidean space, 201, 461 
for vectors, 200 
Triangular matrices, 286, 252, 310 
Trichotomy condition, 34 
‘Trinomial coefficients, 99 
Trivial solution, 180 


Uncountable, 114 
Undetermined coefficients, method of, 364 
Union of sets, 43 

Unit, imaginary, 119 

Unit element, 22, 29 

Unit matrix, 230 

Unit vector, 199 

Unitary, H-, 455 

Unitary group, 456 

Unitary homomorphism of rings, 344 
Unitary mapping, 459 

Unitary matrix, 456, 478-479 

Upper bound, least, 111 


Vandermonde determinant, 313 
Variable, 134, 135, 154 
independent, 154 
Variety, Grassmann, 563 
Vector product, 482-484 
Veetor space, 171-215 
base for, 184 
dimension of, 180, 182-183 
direct sum of, 178, 177 
dual, 223, 438, 565 
Euclidean, 198, 458 
generator of, 177 
subspace of, 176 
Vectors, 171 
angle between, 461 
column, 229 
component of, 233 


Index 


Veetors, length of, 198 
orthogonal, 199, 462 
orthonormal, 199 
perpendicular, 462 
row, 228 
simple, 623 
unit, 199 
Volume-preserving transformation, 492 


Volumes, determinants as, 833-339 


Well-ordered, definition, 38 
Wilson's theorem, 70, 150 


Zero of polynomial, 148, 508 
Zero matrix, 229 


585 


Ses judhraretibt is mea aon inh chars At ale a 
mak Wi Ral Pri Sf hoarse evar mason 

The hanory, bestow by thi tern based Wolf Foundation, recognise ereptionul achievement in 
scgrcultav, roistry, mathematica, ming, phys ard the arte More han 90 ost rioentc 
rave subsequently wom the Nobel rie in malice. phyvicy oF sheminry. 


‘maori ight indinuduile won pine Each pri come wth 800.0, evenly mony winners 
of share prises Morton wha jrna the Ye folly 6s wn assed emia laud 9 
shares the award for mathematics with Micha Artinaf Ye Masel Watlut of Technology, 


‘Meta uli Sy mal himatciams fo hit work in pom ara thos particle 
Vos tts avery of the phencntenian in grometey known as the’ Strong: Thaw. 
George Daniel Mostow, _InUsanmowncrmt of Mort te aati ml is ote in peorotry “open 


fliadgateeF aveWigatione aie ryultnin mary Pliieareas of mathematics * 1 coactustal: 
“Few mathemadtcians can compete with the breadth, spt, and originality of hi work * 


‘Yr Minky caurent chair of Yale math departinent. disribel Mody’ work 24 empty infu: 
‘enia? and the mn a a sheruhes olla” 


“The Maatow Righty Theorem playe a findamoental ro ln nearly every paper on the oometry 
‘of Ue group, andthe lehniqncs anu ua he itrolucet Ry flare obo arte. 
‘morn goomatry ya theory al yi, Minty 4d, “in oar separtmet, he eben 
shiy aod Jiveraohiy overmany Wscaday have mae Wins durmust cherithedtalcagis* 


Meow wan ihe Hert Furl Professor af Mathematics af Yl fr we to 908 Born io 
he ree hn PD. Joe Mar We lve in New Haven ana eatin frequent 
ue Yale matheinatlcs department, ated seminars and continuiny hia research 0 grometry. 


Jean-Vierre Meyer profemer emeritus in the Departinent of Mathimatice at Joh Hopkire, 
allow Aprilag Mew Hy 


‘orm in tay )ramce Mgjer revel his widemgaals, gash, anal doctoral degrees 
from onl Univer. He serya in the US. Army fr to years filmed by teaching 
{pyotntment a Syracuse University wd then» cenerch asaalate position at Dre 
Univer 


In sr. Mer came tah Hap bk where his caver inthe Depurtmient of Mathematits 

spanned years. Hewes chat of the deprtment frum ws wet 0. Myer wn the 

eal force in eatablihing the Jaytar-U: Mathie Irotuty, which wia bane at 

Solos Hapland stil ext today, 
‘Mayer eeived ron John oyins ia 200% but contin to attr he department topo 
gy seminars A armen erie wl tbe pave tthe fll x the Homewood Campa 


